Introduction
In my previous posts on data frames we looked at how to:
- Create an expressive syntax using operator overloading and proxies
- Use WebAssembly for efficient array calculations
- Use libc with WebAssembly
The main problem with that implementation was the marshalling layer which was
hand written. It required all arrays to be copied in and out of the WebAssembly
instance, with manual allocation and freeing of memory.
This implementation introduces a marshalling layer which makes use of the
FinzalizationRegistry
which provides a callback for managing the destruction of objects.
You can find the source code here. To run the examples you will need a version of
node which supports the --harmony-weak-refs
flag (I’m using 14.5.0), the
wasi-sdk 11
(with it’s bin directory on your path) and the
wabt 1.0.16
toolkit (with it’s bin directory on your path).
Usage
You can find more information about the implementation of the marshalling layer in
this post.
What I want to talk about here is usability.
Rather than bundle a fixed set of functions with the data frame I wanted it to be a structural object where the functions are
provided by an extensible repository. This way the user
isn’t limited to the operations defined by the data frame implementation.
A C
function for multiplying two arrays might be written as follows -
1 | __attribute__((used)) double* divideFloat64Arrays (double* array1, double* array2, unsigned int length) |
The functions are registered in JavaScript as follows -
1 | DataFrame.registerFunction( |
which allows the following -
1 | const df = DataFrame.fromObject( |
Functions can also be added for performing calculations directly in JavaScript.
For example -
1 | Series.registerFunction( |
There is a restriction on function prototypes. The first argument is always the
array from the first series, and the last argument is always the length of the
first array.
Although we have used Symbol.for
to define the function for names that map to
an operation, we can use a string for arbitrary calculations -
1 | Series.registerFunction( |
The WASI Marshaller
In order for the data frame to be aware of the WebAssembly instance and the
associated marshalling support, it must be initialized.
1 | // Read the wasm file. |
Thoughts
Efficiency has improved enormously. As typed arrays are visible both
within the JavaScript interpreter and the WebAssembly instance, we only need to
pass the reference to the arrays when performing WebAssembly calculations. The
finalizer support means we don’t need to manually allocate, copy and free memory.
We’ve done all this without adding a much clutter beyond specifying the series
type. This could be improved by adding some heuristics to guess the types.
All this means it’s easier to use the power of WebAssembly without adding to
much burden to the data scientist.
While there is still much left to do (indices, selecting, support for more types
such as timestamps), this feels like a good step forward.