sqlite3/ext/wasm/jaccwabyt/jaccwabyt.md

42 KiB

Jaccwabyt 🐇

Jaccwabyt: JavaScript ⇄ C Struct Communication via WASM Byte Arrays

Welcome to Jaccwabyt, a JavaScript API which creates bindings for WASM-compiled C structs, defining them in such a way that changes to their state in JS are visible in C/WASM, and vice versa, permitting two-way interchange of struct state with very little user-side friction.

(If that means nothing to you, neither will the rest of this page!)

Browser compatibility: this library requires a recent browser and makes no attempt whatsoever to accommodate "older" or lesser-capable ones, where "recent," very roughly, means released in mid-2018 or later, with late 2021 releases required for some optional features in some browsers (e.g. [BigInt64Array][] in Safari). It also relies on a couple non-standard, but widespread, features, namely [TextEncoder][] and [TextDecoder][]. It is developed primarily on Firefox and Chrome on Linux and all claims of Safari compatibility are based solely on feature compatibility tables provided at [MDN][].

Formalities:

The license for both this documentation and the software it documents is the same as sqlite3, the project from which this spinoff project was spawned:


2022-06-30:

The author disclaims copyright to this source code. In place of a legal notice, here is a blessing:

May you do good and not evil. May you find forgiveness for yourself and forgive others. May you share freely, never taking more than you give.


Table of Contents

Overview

Management summary: this JavaScript-only framework provides limited two-way bindings between C structs and JavaScript objects, such that changes to the struct in one environment are visible in the other.

Details...

It works by creating JavaScript proxies for C structs. Reads and writes of the JS-side members are marshaled through a flat byte array allocated from the WASM heap. As that heap is shared with the C-side code, and the memory block is written using the same approach C does, that byte array can be used to access and manipulate a given struct instance from both JS and C.

Motivating use case: this API was initially developed as an experiment to determine whether it would be feasible to implement, completely in JS, custom "VFS" and "virtual table" objects for the WASM build of sqlite3. Doing so was going to require some form of two-way binding of several structs. Once the proof of concept was demonstrated, a rabbit hole appeared and down we went... It has since grown beyond its humble proof-of-concept origins and is believed to be a useful (or at least interesting) tool for mixed JS/C applications.

Portability notes:

  • These docs sometimes use Emscripten as a point of reference because it is the most widespread WASM toolchain, but this code is specifically designed to be usable in arbitrary WASM environments. It abstracts away a few Emscripten-specific features into configurable options. Similarly, the build tree requires Emscripten but Jaccwabyt does not have any hard Emscripten dependencies.
  • This code is encapsulated into a single JavaScript function. It should be trivial to copy/paste into arbitrary WASM/JS-using projects.
  • The source tree includes C code, but only for testing and demonstration purposes. It is not part of the core distributable.

Architecture

BSBF: box rad 0.3*boxht "StructBinderFactory" fit fill lightblue
BSB: box same "StructBinder" fit at 0.75 e of 0.7 s of BSBF.c
BST: box same "StructType<T>" fit at 1.5 e of BSBF
BSC: box same "Struct<T>" "Ctor" fit at 1.5 s of BST
BSI: box same "Struct<T>" "Instances" fit at 1 right of BSB.e
BC: box same at 0.25 right of 1.6 e of BST "C Structs" fit fill lightgrey

arrow -> from BSBF.s to BSB.w "Generates" aligned above
arrow -> from BSB.n to BST.sw "Contains" aligned above
arrow -> from BSB.s to BSC.nw "Generates" aligned below
arrow -> from BSC.ne to BSI.s "Constructs" aligned below
arrow <- from BST.se to BSI.n "Inherits" aligned above
arrow <-> from BSI.e to BC.s dotted "Shared" aligned above "Memory" aligned below
arrow -> from BST.e to BC.w dotted "Mirrors Struct" aligned above "Model From" aligned below
arrow -> from BST.s to BSC.n "Prototype of" aligned above

Its major classes and functions are:

  • StructBinderFactory is a factory function which accepts a configuration object to customize it for a given WASM environment. A client will typically call this only one time, with an appropriate configuration, to generate a single...
  • StructBinder is a factory function which converts an arbitrary number struct descriptions into...
  • StructTypes are constructors, one per struct description, which inherit from StructBinder.StructType and are used to instantiate...
  • Struct instances are objects representing individual instances of generated struct types.

An app may have any number of StructBinders, but will typically need only one. Each StructBinder is effectively a separate namespace for struct creation.

Creating and Binding Structs

From the amount of documentation provided, it may seem that creating and using struct bindings is a daunting task, but it essentially boils down to:

  1. Confire Jaccwabyt for your WASM environment. This is a one-time task per project and results is a factory function which can create new struct bindings.
  2. Create a JSON-format description of your C structs. This is required once for each struct and required updating if the C structs change.
  3. Feed (2) to the function generated by (1) to create JS constuctor functions for each struct. This is done at runtime, as opposed to during a build-process step, and can be set up in such a way that it does not require any maintenace after its initial setup.
  4. Create and use instances of those structs.

Detailed instructions for each of those steps follows...

Step 1: Configure Jaccwabyt for the Environment

Jaccwabyt's highest-level API is a single function. It creates a factory for processing struct descriptions, but does not process any descriptions itself. This level of abstraction exist primarily so that the struct-specific factories can be configured for a given WASM environment. Its usage looks like:

const MyBinder = StructBinderFactory({
  // These config options are all required:
  heap: WebAssembly.Memory instance or a function which returns
        a Uint8Array or Int8Array view of the WASM memory,
  alloc:   function(howMuchMemory){...},
  dealloc: function(pointerToFree){...}
});

It also offers a number of other settings, but all are optional except for the ones shown above. Those three config options abstract away details which are specific to a given WASM environment. They provide the WASM "heap" memory (a byte array), the memory allocator, and the deallocator. In a conventional Emscripten setup, that config might simply look like:

{
    heap:    Module['asm']['memory'],
    //Or:
    // heap: ()=>Module['HEAP8'],
    alloc:   (n)=>Module['_malloc'](n),
    dealloc: (m)=>Module['_free'](m)
}

The StructBinder factory function returns a function which can then be used to create bindings for our structs.

Step 2: Create a Struct Description

The primary input for this framework is a JSON-compatible construct which describes a struct we want to bind. For example, given this C struct:

// C-side:
struct Foo {
  int member1;
  void * member2;
  int64_t member3;
};

Its JSON description looks like:

{
  "name": "Foo",
  "sizeof": 16,
  "members": {
    "member1": {"offset": 0,"sizeof": 4,"signature": "i"},
    "member2": {"offset": 4,"sizeof": 4,"signature": "p"},
    "member3": {"offset": 8,"sizeof": 8,"signature": "j"}
  }
}

These data must match up with the C-side definition of the struct (if any). See Appendix G for one way to easily generate these from C code.

Each entry in the members object maps the member's name to its low-level layout:

  • offset: the byte offset from the start of the struct, as reported by C's offsetof() feature.
  • sizeof: as reported by C's sizeof().
  • signature: described below.
  • readOnly: optional. If set to true, the binding layer will throw if JS code tries to set that property.

The order of the members entries is not important: their memory layout is determined by their offset and sizeof members. The name property is technically optional, but one of the steps in the binding process requires that either it be passed an explicit name or there be one in the struct description. The names of the members entries need not match their C counterparts. Project conventions may call for giving them different names in the JS side and the StructBinderFactory can be configured to automatically add a prefix and/or suffix to their names.

Nested structs are as-yet unsupported by this tool.

Struct member "signatures" describe the data types of the members and are an extended variant of the format used by Emscripten's addFunction(). A signature for a non-function-pointer member, or function pointer member which is to be modelled as an opaque pointer, is a single letter. A signature for a function pointer may also be modelled as a series of letters describing the call signature. The supported letters are:

  • v = void (only used as return type for function pointer members)
  • i = int32 (4 bytes)
  • j = int64 (8 bytes) is only really usable if this code is built with BigInt support (e.g. using the Emscripten -sWASM_BIGINT build flag). Without that, this API may throw when encountering the j signature entry.
  • f = float (4 bytes)
  • d = double (8 bytes)
  • c = int8 (1 byte) char - see notes below!
  • C = uint8 (1 byte) unsigned char - see notes below!
  • p = int32 (see notes below!)
  • P = Like p but with extra handling. Described below.
  • s = like int32 but is a hint that it's a pointer to a string so that some (very limited) contexts may treat it as such, noting that such algorithms must, for lack of information to the contrary, assume both that the encoding is UTF-8 and that the pointer's member is NUL-terminated. If that is not the case for a given string member, do not use s: use i or p instead and do any string handling yourself.

Noting that:

  • All of these types are numeric. Attempting to set any struct-bound property to a non-numeric value will trigger an exception except in cases explicitly noted otherwise.
  • "Char" types: WASM does not define an int8 type, nor does it distinguish between signed and unsigned. This API treats c as int8 and C as uint8 for purposes of getting and setting values when using the DataView class. It is not recommended that client code use these types in new WASM-capable code, but they were added for the sake of binding some immutable legacy code to WASM.

Sidebar: Emscripten's public docs do not mention p, but their generated code includes p as an alias for i, presumably to mean "pointer". Though i is legal for pointer types in the signature, p is more descriptive, so this framework encourages the use of p for pointer-type members. Using p for pointers also helps future-proof the signatures against the eventuality that WASM eventually supports 64-bit pointers. Note that sometimes p really means pointer-to-pointer, but the Emscripten JS/WASM glue does not offer that level of expressiveness in these signatures. We simply have to be aware of when we need to deal with pointers and pointers-to-pointers in JS code.

Trivia: this API treates p as distinctly different from i in some contexts, so its use is encouraged for pointer types.

Signatures in the form x(...) denote function-pointer members and x denotes non-function members. Functions with no arguments use the form x(). For function-type signatures, the strings are formulated such that they can be passed to Emscripten's addFunction() after stripping out the ( and ) characters. For good measure, to match the public Emscripten docs, p, c, and C, should also be replaced with i. In JavaScript that might look like:

signature.replace(/[^vipPsjfdcC]/g,'').replace(/[pPscC]/g,'i');

P vs p in Method Signatures

This support is experimental and subject to change.

The method signature letter p means "pointer," which, in WASM, means "integer." p is treated as an integer for most contexts, while still also being a separate type (analog to how pointers in C are just a special use of unsigned numbers). A capital P changes the semantics of plain member pointers (but not, as of this writing, function pointer members) as follows:

  • When a P-type member is set via myStruct.x=y, if (y instanceof StructType) then the value of y.pointer is stored in myStruct.x. If y is neither a number nor a StructType, an exception is triggered (regardless of whether p or P is used).

Step 3: Binding the Struct

We can now use the results of steps 1 and 2:

const MyStruct = MyBinder(myStructDescription);

That creates a new constructor function, MyStruct, which can be used to instantiate new instances. The binder will throw if it encounters any problems.

That's all there is to it.

Sidebar: that function may modify the struct description object and/or its sub-objects, or may even replace sub-objects, in order to simplify certain later operations. If that is not desired, then feed it a copy of the original, e.g. by passing it JSON.parse(JSON.stringify(structDefinition)).

Step 4: Creating, Using, and Destroying Struct Instances

Now that we have our constructor...

const my = new MyStruct();

It is important to understand that creating a new instance allocates memory on the WASM heap. We must not simply rely on garbage collection to clean up the instances because doing so will not free up the WASM heap memory. The correct way to free up that memory is to use the object's dispose() method.

The following usage pattern offers one way to easily ensure proper cleanup of struct instances:

const my = new MyStruct();
try {
  console.log(my.member1, my.member2, my.member3);
  my.member1 = 12;
  assert(12 === my.member1);
  /* ^^^ it may seem silly to test that, but recall that assigning that
     property encodes the value into a byte array in heap memory, not
     a normal JS property. Similarly, fetching the property decodes it
     from the byte array. */
  // Pass the struct to C code which takes a MyStruct pointer:
  aCFunction( my.pointer );
} finally {
  my.dispose();
}

Sidebar: the finally block will be run no matter how the try exits, whether it runs to completion, propagates an exception, or uses flow-control keywords like return or break. It is perfectly legal to use try/finally without a catch, and doing so is an ideal match for the memory management requirements of Jaccwaby-bound struct instances.

It is often useful to wrap an existing instance of a C-side struct without taking over ownership of its memory. That can be achieved by simply passing a pointer to the constructor. For example:

const m = new MyStruct( functionReturningASharedPtr() );
// calling m.dispose() will _not_ free the wrapped C-side instance
// but will trigger any ondispose handler.

Now that we have struct instances, there are a number of things we can do with them, as covered in the rest of this document.

API Reference

API: Binder Factory

This is the top-most function of the API, from which all other functions and types are generated. The binder factory's signature is:

Function StructBinderFactory(object configOptions);

It returns a function which these docs refer to as a StructBinder (covered in the next section). It throws on error.

The binder factory supports the following options in its configuration object argument:

  • heap
    Must be either a WebAssembly.Memory instance representing the WASM heap memory OR a function which returns an Int8Array or Uint8Array view of the WASM heap. In the latter case the function should, if appropriate for the environment, account for the heap being able to grow. Jaccwabyt uses this property in such a way that it "should" be okay for the WASM heap to grow at runtime (that case is, however, untested).

  • alloc
    Must be a function semantically compatible with Emscripten's Module._malloc(). That is, it is passed the number of bytes to allocate and it returns a pointer. On allocation failure it may either return 0 or throw an exception. This API will throw an exception if allocation fails or will propagate whatever exception the allocator throws. The allocator must use the same heap as the heap config option.

  • dealloc
    Must be a function semantically compatible with Emscripten's Module._free(). That is, it takes a pointer returned from alloc() and releases that memory. It must never throw and must accept a value of 0/null to mean "do nothing" (noting that 0 is technically a legal memory address in WASM, but that seems like a design flaw).

  • bigIntEnabled (bool=true if BigInt64Array is available, else false)
    If true, the WASM bits this code is used with must have been compiled with int64 support (e.g. using Emscripten's -sWASM_BIGINT flag). If that's not the case, this flag should be set to false. If it's enabled, BigInt support is assumed to work and certain extra features are enabled. Trying to use features which requires BigInt when it is disabled (e.g. using 64-bit integer types) will trigger an exception.

  • memberPrefix and memberSuffix (string="")
    If set, struct-defined properties get bound to JS with this string as a prefix resp. suffix. This can be used to avoid symbol name collisions between the struct-side members and the JS-side ones and/or to make more explicit which object-level properties belong to the struct mapping and which to the JS side. This does not modify the values in the struct description objects, just the property names through which they are accessed via property access operations and the various a StructInstance APIs (noting that the latter tend to permit both the original names and the names as modified by these settings).

  • log
    Optional function used for debugging output. By default console.log is used but by default no debug output is generated. This API assumes that the function will space-separate each argument (like console.log does). See Appendix D for info about enabling debugging output.

API: Struct Binder

Struct Binders are factories which are created by the StructBinderFactory. A given Struct Binder can process any number of distinct structs. In a typical setup, an app will have ony one shared Binder Factory and one Struct Binder. Struct Binders which are created via different StructBinderFactory calls are unrelated to each other, sharing no state except, perhaps, indirectly via StructBinderFactory configuration (e.g. the memory heap).

These factories have two call signatures:

Function StructBinder([string structName,] object structDescription)

If the struct description argument has a name property then the name argument is optional, otherwise it is required.

The returned object is a constructor for instances of the struct described by its argument(s), each of which derives from a separate StructType instance.

The Struct Binder has the following members:

  • allocCString(str)
    Allocates a new UTF-8-encoded, NUL-terminated copy of the given JS string and returns its address relative to config.heap(). If allocation returns 0 this function throws. Ownership of the memory is transfered to the caller, who must eventually pass it to the configured config.dealloc() function.

  • config
    The configuration object passed to the StructBinderFactory, primarily for accessing the memory (de)allocator and memory. Modifying any of its "significant" configuration values may have undefined results.

API: Struct Type

The StructType class is a property of the StructBinder function.

Each constructor created by a StructBinder inherits from its own instance of the StructType class, which contains state specific to that struct type (e.g. the struct name and description metadata). StructTypes which are created via different StructBinder instances are unrelated to each other, sharing no state except StructBinderFactory config options.

The StructType constructor cannot be called from client code. It is only called by the StructBinder-generated constructors. The StructBinder.StructType object has the following "static" properties (^Which are accessible from individual instances via theInstance.constructor.):

  • addOnDispose(...value)\
    If this object has no ondispose property, this function creates it as an array and pushes the given value(s) onto it. If the object has a function-typed ondispose property, this call replaces it with an array and moves that function into the array. In all other cases, ondispose is assumed to be an array and the argument(s) is/are appended to it. Returns this.

  • allocCString(str)
    Identical to the StructBinder method of the same name.

  • hasExternalPointer(object)
    Returns true if the given object's pointer member refers to an "external" object. That is the case when a pointer is passed to a struct's constructor. If true, the memory is owned by someone other than the object and must outlive the object.

  • isA(value)
    Returns true if its argument is a StructType instance from the same StructBinder as this StructType.

  • memberKey(string)
    Returns the given string wrapped in the configured memberPrefix and memberSuffix values. e.g. if passed "x" and memberPrefix is "$" then it returns "$x". This does not verify that the property is actually a struct a member, it simply transforms the given string. TODO(?): add a 2nd parameter indicating whether it should validate that it's a known member name.

The base StructType prototype has the following members, all of which are inherited by struct instances and may only legally be called on concrete struct instances unless noted otherwise:

  • dispose()
    Frees, if appropriate, the WASM-allocated memory which is allocated by the constructor. If this is not called before the JS engine cleans up the object, a leak in the WASM heap memory pool will result.
    When dispose() is called, if the object has a property named ondispose then it is treated as follows:

    • If it is a function, it is called with the struct object as its this. That method must not throw - if it does, the exception will be ignored.
    • If it is an array, it may contain functions, pointers, other StructType instances, and/or JS strings. If an entry is a function, it is called as described above. If it's a number, it's assumed to be a pointer and is passed to the dealloc() function configured for the parent StructBinder. If it's a StructType instance then its dispose() method is called. If it's a JS string, it's assumed to be a helpful description of the next entry in the list and is simply ignored. Strings are supported primarily for use as debugging information.
    • Some struct APIs will manipulate the ondispose member, creating it as an array or converting it from a function to array as needed.
  • lookupMember(memberName,throwIfNotFound=true)
    Given the name of a mapped struct member, it returns the member description object. If not found, it either throws (if the 2nd argument is true) or returns undefined (if the second argument is false). The first argument may be either the member name as it is mapped in the struct description or that same name with the configured memberPrefix and memberSuffix applied, noting that the lookup in the former case is faster.\
    This method may be called directly on the prototype, without a struct instance.

  • memberToJsString(memberName)
    Uses this.lookupMember(memberName,true) to look up the given member. If its signature is s then it is assumed to refer to a NUL-terminated, UTF-8-encoded string and its memory is decoded as such. If its signature is not one of those then an exception is thrown. If its address is 0, null is returned. See also: setMemberCString().

  • memberIsString(memberName [,throwIfNotFound=true])
    Uses this.lookupMember(memberName,throwIfNotFound) to look up the given member. Returns the member description object if the member has a signature of s, else returns false. If the given member is not found, it throws if the 2nd argument is true, else it returns false.

  • memberKey(string)
    Works identically to StructBinder.StructType.memberKey().

  • memberKeys()
    Returns an array of the names of the properties of this object which refer to C-side struct counterparts.

  • memberSignature(memberName [,emscriptenFormat=false])
    Returns the signature for a given a member property, either in this framework's format or, if passed a truthy 2nd argument, in a format suitable for the 2nd argument to Emscripten's addFunction(). Throws if the first argument does not resolve to a struct-bound member name. The member name is resolved using this.lookupMember() and throws if the member is found mapped.

  • memoryDump()
    Returns a Uint8Array which contains the current state of this object's raw memory buffer. Potentially useful for debugging, but not much else. Note that the memory is necessarily, for compatibility with C, written in the host platform's endianness and is thus not useful as a persistent/portable serialization format.

  • setMemberCString(memberName,str)
    Uses StructType.allocCString() to allocate a new C-style string, assign it to the given member, and add the new string to this object's ondispose list for cleanup when this.dispose() is called. This function throws if lookupMember() fails for the given member name, if allocation of the string fails, or if the member has a signature value of anything other than s. Returns this.
    Achtung: calling this repeatedly will not immediately free the previous values because this code cannot know whether they are in use in other places, namely C. Instead, each time this is called, the prior value is retained in the ondispose list for cleanup when the struct is disposed of. Because of the complexities and general uncertainties of memory ownership and lifetime in such constellations, it is recommended that the use of C-string members from JS be kept to a minimum or that the relationship be one-way: let C manage the strings and only fetch them from JS using, e.g., memberToJsString().

API: Struct Constructors

Struct constructors (the functions returned from StructBinder) are used for, intuitively enough, creating new instances of a given struct type:

const x = new MyStruct;

Normally they should be passed no arguments, but they optionally accept a single argument: a WASM heap pointer address of memory which the object will use for storage. It does not take over ownership of that memory and that memory must be valid at for least as long as this struct instance. This is used, for example, to proxy static/shared C-side instances:

const x = new MyStruct( someCFuncWhichReturnsAMyStructPointer() );
...
x.dispose(); // does NOT free the memory

The JS-side construct does not own the memory in that case and has no way of knowing when the C-side struct is destroyed. Results are specifically undefined if the JS-side struct is used after the C-side struct's member is freed.

Potential TODO: add a way of passing ownership of the C-side struct to the JS-side object. e.g. maybe simply pass true as the second argument to tell the constructor to take over ownership. Currently the pointer can be taken over using something like myStruct.ondispose=[myStruct.pointer] immediately after creation.

These constructors have the following "static" members:

  • isA(value)
    Returns true if its argument was created by this constructor.

  • memberKey(string)
    Works exactly as documented for StructType.

  • memberKeys(string)
    Works exactly as documented for StructType.

  • structInfo
    The structure description passed to StructBinder when this constructor was generated.

  • structName
    The structure name passed to StructBinder when this constructor was generated.

API: Struct Prototypes

The prototypes of structs created via the constructors described in the previous section are each a struct-type-specific instance of StructType and add the following struct-type-specific properties to the mix:

  • structInfo
    The struct description metadata, as it was given to the StructBinder which created this class.

  • structName
    The name of the struct, as it was given to the StructBinder which created this class.

API: Struct Instances

Instances of structs created via the constructors described above each have the following instance-specific state in common:

  • pointer
    A read-only numeric property which is the "pointer" returned by the configured allocator when this object is constructed. After dispose() (inherited from StructType) is called, this property has the undefined value. When calling C-side code which takes a pointer to a struct of this type, simply pass it myStruct.pointer.

Appendices

Appendix A: Limitations, TODOs, and Non-TODOs

  • This library only supports the basic set of member types supported by WASM: numbers (which includes pointers). Nested structs are not handled except that a member may be a pointer to such a struct. Whether or not it ever will depends entirely on whether its developer ever needs that support. Conversion of strings between JS and C requires infrastructure specific to each WASM environment and is not directly supported by this library.

  • Binding functions to struct instances, such that C can see and call JS-defined functions, is not as transparent as it really could be, due to shortcomings in the Emscripten addFunction()/removeFunction() interfaces. Until a replacement for that API can be written, this support will be quite limited. It is possible to bind a JS-defined function to a C-side function pointer and call that function from C. What's missing is easier-to-use/more transparent support for doing so.

    • In the meantime, a standalone subproject of Jaccwabyt provides such a binding mechanism, but integrating it directly with Jaccwabyt would not only more than double its size but somehow feels inappropriate, so experimentation is in order for how to offer that capability via completely optional StructBinderFactory config options.
  • It "might be interesting" to move access of the C-bound members into a sub-object. e.g., from JS they might be accessed via myStructInstance.s.structMember. The main advantage is that it would eliminate any potential confusion about which members are part of the C struct and which exist purely in JS. "The problem" with that is that it requires internally mapping the s member back to the object which contains it, which makes the whole thing more costly and adds one more moving part which can break. Even so, it's something to try out one rainy day. Maybe even make it optional and make the s name configurable via the StructBinderFactory options. (Over-engineering is an arguably bad habit of mine.)

  • It "might be interesting" to offer (de)serialization support. It would be very limited, e.g. we can't serialize arbitrary pointers in any meaningful way, but "might" be useful for structs which contain only numeric or C-string state. As it is, it's easy enough for client code to write wrappers for that and handle the members in ways appropriate to their apps. Any impl provided in this library would have the shortcoming that it may inadvertently serialize pointers (since they're just integers), resulting in potential chaos after deserialization. Perhaps the struct description can be extended to tag specific members as serializable and how to serialize them.

Appendix D: Debug Info

The StructBinderFactory, StructBinder, and StructType classes all have the following "unsupported" method intended primarily to assist in their own development, as opposed to being for use in client code:

  • debugFlags(flags) (integer)
    An "unsupported" debugging option which may change or be removed at any time. Its argument is a set of flags to enable/disable certain debug/tracing output for property accessors: 0x01 for getters, 0x02 for setters, 0x04 for allocations, 0x08 for deallocations. Pass 0 to disable all flags and pass a negative value to completely clear all flags. The latter has the side effect of telling the flags to be inherited from the next-higher-up class in the hierarchy, with StructBinderFactory being top-most, followed by StructBinder, then StructType.

Appendix G: Generating Struct Descriptions From C

Struct definitions are ideally generated from WASM-compiled C, as opposed to simply guessing the sizeofs and offsets, so that the sizeof and offset information can be collected using C's sizeof() and offsetof() features (noting that struct padding may impact offsets in ways which might not be immediately obvious, so writing them by hand is most certainly not recommended).

How exactly the desciption is generated is necessarily project-dependent. It's tempting say, "oh, that's easy! We'll just write it by hand!" but that would be folly. The struct sizes and byte offsets into the struct must be precisely how C-side code sees the struct or the runtime results are completely undefined.

The approach used in developing and testing this software is...

Below is a complete copy/pastable example of how we can use a small set of macros to generate struct descriptions from C99 or later into static string memory. Simply add such a file to your WASM build, arrange for its function to be exported1, and call it from JS (noting that it requires environment-specific JS glue to convert the returned pointer to a JS-side string). Use JSON.parse() to process it, then feed the included struct descriptions into the binder factory at your leisure.


#include <string.h> /* memset() */
#include <stddef.h> /* offsetof() */
#include <stdio.h>  /* snprintf() */
#include <stdint.h> /* int64_t */
#include <assert.h>

struct ExampleStruct {
  int v4;
  void * ppV;
  int64_t v8;
  void (*xFunc)(void*);
};
typedef struct ExampleStruct ExampleStruct;

const char * wasm__ctype_json(void){
  static char strBuf[512 * 8] = {0}
    /* Static buffer which must be sized large enough for
       our JSON. The string-generation macros try very
       hard to assert() if this buffer is too small. */;
  int n = 0, structCount = 0 /* counters for the macros */;
  char * pos = &strBuf[1]
    /* Write-position cursor. Skip the first byte for now to help
       protect against a small race condition */;
  char const * const zEnd = pos + sizeof(strBuf)
    /* one-past-the-end cursor (virtual EOF) */;
  if(strBuf[0]) return strBuf; // Was set up in a previous call.

  ////////////////////////////////////////////////////////////////////
  // First we need to build up our macro framework...

  ////////////////////////////////////////////////////////////////////
  // Core output-generating macros...
#define lenCheck assert(pos < zEnd - 100)
#define outf(format,...) \
  pos += snprintf(pos, ((size_t)(zEnd - pos)), format, __VA_ARGS__); \
  lenCheck
#define out(TXT) outf("%s",TXT)
#define CloseBrace(LEVEL) \
  assert(LEVEL<5); memset(pos, '}', LEVEL); pos+=LEVEL; lenCheck

  ////////////////////////////////////////////////////////////////////
  // Macros for emiting StructBinders...
#define StructBinder__(TYPE)                 \
  n = 0;                                     \
  outf("%s{", (structCount++ ? ", " : ""));  \
  out("\"name\": \"" # TYPE "\",");          \
  outf("\"sizeof\": %d", (int)sizeof(TYPE)); \
  out(",\"members\": {");
#define StructBinder_(T) StructBinder__(T)
// ^^^ extra indirection needed to expand CurrentStruct
#define StructBinder StructBinder_(CurrentStruct)
#define _StructBinder CloseBrace(2)
#define M(MEMBER,SIG)                                         \
  outf("%s\"%s\": "                                           \
       "{\"offset\":%d,\"sizeof\": %d,\"signature\":\"%s\"}", \
       (n++ ? ", " : ""), #MEMBER,                            \
       (int)offsetof(CurrentStruct,MEMBER),                   \
       (int)sizeof(((CurrentStruct*)0)->MEMBER),              \
       SIG)
  // End of macros.
  ////////////////////////////////////////////////////////////////////

  ////////////////////////////////////////////////////////////////////
  // With that out of the way, we can do what we came here to do.
  out("\"structs\": ["); {

// For each struct description, do...
#define CurrentStruct ExampleStruct
    StructBinder {
      M(v4,"i");
      M(ppV,"p");
      M(v8,"j");
      M(xFunc,"v(p)");
    } _StructBinder;
#undef CurrentStruct

  } out( "]"/*structs*/);
  ////////////////////////////////////////////////////////////////////
  // Done! Finalize the output...
  out("}"/*top-level wrapper*/);
  *pos = 0;
  strBuf[0] = '{'/*end of the race-condition workaround*/;
  return strBuf;

// If this file will ever be concatenated or #included with others,
// it's good practice to clean up our macros:
#undef StructBinder
#undef StructBinder_
#undef StructBinder__
#undef M
#undef _StructBinder
#undef CloseBrace
#undef out
#undef outf
#undef lenCheck
}