Mun Jam #1

September 22, 2019, The Mun Team

As we finally found ourselves in the same country, our two-man team decided to get together for a Mun Jam. Our goal? Define an ABI for sharing function and type information between the Mun compiler and runtime, allowing us to integrate both components into a single pipeline. In Mun, a single file or group of files is compiled into - what we call - an assembly that can be hot reloaded at runtime. To guarantee that argument and return types of functions in different assemblies correspond, we need to expose their function and type information. The Application Binary Interface (ABI) that is used to express these so-called symbols adheres to the C ABI, to make it easier to integrate Mun with other programming languages.

#ifndef MUN_ABI_H_
#define MUN_ABI_H_

#include <stdint.h>

typedef struct
{
    uint8_t b[16];
} MunGuid;

typedef struct
{
    MunGuid guid;
    const char *name;
} MunTypeInfo;

typedef enum
{
    MunPrivacyPublic = 0,
    MunPrivacyPrivate = 1
} MunPrivacy;

typedef uint8_t MunPrivacy_t;

typedef struct
{
    const char *name;
    const MunTypeInfo *arg_types;
    const MunTypeInfo *return_type;
    const void *fn_ptr;
    uint16_t num_arg_types;
    MunPrivacy_t privacy;
} MunFunctionInfo;

typedef struct
{
    const MunFunctionInfo *functions;
    uint32_t num_functions;
} MunModuleInfo;

#endif

LLVM vs C ABI

Every assembly contains a get_symbol function that returns symbols for the top-level module as a MunModuleInfo. As the Mun compiler already has a syntax tree of all function signatures and types in the assembly, we can easily generate the required symbols. Or so we thought. When we tested our compiled assembly in the Rust-written runtime, the returned MunModuleInfo contained corrupted data. After a search down the internet rabbit hole, we discovered the cause. Take this function:

MunModuleInfo get_symbols()

Based on our understanding of LLVM, we generated the following IR:

define %struct.MunModuleInfo @get_symbols()

To discover possible discrepencies with C-style IR we ran clang (clang -S -emit-llvm main.c), resulting in this output:

define void @get_symbols(%struct.MunModuleInfo* noalias sret)

It turns out that the sret identifier is the culprit here. sret - which stands for StructReturn - is apparently the way structs larger than the native pointer size should be returned from C-style functions.

Type GUIDs

To communicate and recognize types across compilations, we need to specify a globally unique type identifier - or GUID. The GUID is generated by creating an MD5 hash from the type name and module path. Built-in types are a special case; for those we use the @core module path. E.g. for the float type, we generate an MD5 hash using the "@core::float" string.