How to Pass Strings Between JavaScript and WebAssembly

In my
first post
on WebAssembly I dodged the classic “Hello, World!” example as I thought it might
be a bit tricky. Little did I realise how hard it was going to be!

In this post I’m going to look at how to solve it. The code for the project
can be found here.

The Problem

In JavaScript strings are encoded with utf-8. This format encodes characters
that cannot be represented in the ascii character set with more than one byte.

The JavaScript Side

On the JavaScript side we can use
TextEncoder and
TextDecoder
in the following manner.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class WasiMemoryManager {
constructor (memory, malloc, free) {
this.memory = memory
this.malloc = malloc
this.free = free
}

// Convert a pointer from the wasm module to JavaScript string.
convertToString (ptr, length) {
try {
// The pointer is a multi byte character array encoded with utf-8.
const array = new Uint8Array(this.memory.buffer, ptr, length)
const decoder = new TextDecoder()
const string = decoder.decode(array)
return string
} finally {
// Free the memory sent to use from the WebAssembly instance.
this.free(ptr)
}
}

// Convert a JavaScript string to a pointer to multi byte character array
convertFromString(string) {
// Encode the string in utf-8.
const encoder = new TextEncoder()
const bytes = encoder.encode(string)
// Copy the string into memory allocated in the WebAssembly
const ptr = this.malloc(bytes.byteLength)
const buffer = new Uint8Array(this.memory.buffer, ptr, bytes.byteLength + 1)
buffer.set(bytes)
return buffer
}
}

Note how malloc and free and used to manage the memory in the
WebAssembly module.

The C Side

The C standard library provides support for multi byte characters
[1],
so I coded up a function to count the letters (not the bytes) of a string, and
a function to log “Hello, World!” in english and mandarin.

This was a complete failure!

Looking harder at the examples I noticed I was missing a called to setlocale,
so I added that to my C code. The final code looks as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <stdlib.h>
#include <string.h>
#include <wchar.h>
#include <locale.h>

// Import console.log from JavaScript.
extern void consoleLog(char* ptr, int length);

int is_locale_initialised = 0;

static void initLocale()
{
// The locale must be initialised before using
// multi byte characters.
is_locale_initialised = 1;
setlocale(LC_ALL, "");
}

// Count the letters in a utf-8 encoding string.
__attribute__((used)) size_t countLetters(char* ptr)
{
if (is_locale_initialised == 0)
initLocale();

size_t letters = 0;
const char* end = ptr + strlen(ptr);
mblen(NULL, 0); // reset the conversion state
while(ptr < end) {
int next = mblen(ptr, end - ptr);
if(next == -1) {
return -1;
}
ptr += next;
++letters;
}
return letters;
}

// Say hello in english
__attribute__((used)) void sayHelloWorld()
{
if (is_locale_initialised == 0)
initLocale();

const char* s1 = "Hello World";
size_t len = strlen(s1);
char* s2 = malloc(len + 1);
consoleLog(strcpy(s2, s1), (int) len);
}

// Say hello in Mandarin
__attribute__((used)) void sayHelloWorldInMandarin()
{
if (is_locale_initialised == 0)
initLocale();

const wchar_t* wstr = L"你好,世界!";
mbstate_t state;
memset(&state, 0, sizeof(state));
size_t len = wcsrtombs(NULL, &wstr, 0, &state);
char* mbstr = malloc(len + 1);
wcsrtombs(mbstr, &wstr, len + 1, &state);
consoleLog(mbstr, (int) len);
}

Then I ran the program and BOOM I get an error
complaining that wasi_snapshot_preview1 is not defined. I knew what this meant
(the libc implementation needed a WASI module to implement
system calls), but I didn’t think I needed one, as I’m only using strings.

It turns out that the first thing setlocale does is check some environment
variables. To do that it needs some wasi functions to call out of the wasm
sandbox.

Implementing a Minimal WASI

After a consulting the WASI API documentation
I decided I needed to implement
environ_get
and it’s companion
environ_sizes_get. I put these in a class called Wasi
and passed them to the module on instantiation using the wasi_snapshot_preview1
key as follows:

1
2
3
const res = await WebAssembly.instantiate(buf, {
wasi_snapshot_preview1: wasi,
})

Now when I ran the program I got no errors about wasi_snapshot_preview1, but I
get a new error saying that proc_exit was missing. I found that in the docs,
and it gets called to terminate the process. As I’ve got nothing to terminate
I created a stub for it. My Wasi class looks as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
const WasiMemoryManager = require('./wasi-memory-manager')

// An implementation of WASI which supports the minimum
// required to use multi byte characters.
class Wasi {
constructor (env) {
this.env = env
this.instance = null
this.wasiMemoryManager = null
}

// Initialise the instance from the WebAssembly.
init = (instance) => {
this.instance = instance
this.wasiMemoryManager = new WasiMemoryManager(
instance.exports.memory,
instance.exports.malloc,
instance.exports.free
)
}

static WASI_ESUCCESS = 0

// Get the environment variables.
environ_get = (environ, environBuf) => {
const encoder = new TextEncoder()
const view = new DataView(this.wasiMemoryManager.memory.buffer)

Object.entries(this.env).map(
([key, value]) => `${key}=${value}`
).forEach(envVar => {
view.setUint32(environ, environBuf, true)
environ += 4

const bytes = encoder.encode(envVar)
const buf = new Uint8Array(this.wasiMemoryManager.memory.buffer, environBuf, bytes.length + 1)
environBuf += buf.byteLength
});
return this.WASI_ESUCCESS;
}

// Get the size required to store the environment variables.
environ_sizes_get = (environCount, environBufSize) => {
const encoder = new TextEncoder()
const view = new DataView(this.wasiMemoryManager.memory.buffer)

const envVars = Object.entries(this.env).map(
([key, value]) => `${key}=${value}`
)
const size = envVars.reduce(
(acc, envVar) => acc + encoder.encode(envVar).byteLength + 1,
0
)
view.setUint32(environCount, envVars.length, true)
view.setUint32(environBufSize, size, true)

return this.WASI_ESUCCESS
}

// This gets called on exit to stop the running program.
// We don't have anything to stop!
proc_exit = (rval) => {
return this.WASI_ESUCCESS
}
}

module.exports = Wasi

And to hook it all up I did the following.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const fs = require('fs')
const Wasi = require('./wasi')

async function setupWasi(fileName) {
// Read the wasm file.
const buf = fs.readFileSync(fileName)

// Create the Wasi instance passing in some environment variables.
const wasi = new Wasi({
"LANG": "en_GB.UTF-8",
"TERM": "xterm"
})

// Instantiate the wasm module.
const res = await WebAssembly.instantiate(buf, {
wasi_snapshot_preview1: wasi,
env: {
// This function is exported to the web assembly.
consoleLog: function(ptr, length) {
// This converts the pointer to a string and frees he memory.
const string = wasi.wasiMemoryManager.convertToString(ptr, length)
console.log(string)
}
}
})

// Initialise the wasi instance
wasi.init(res.instance)

return wasi
}

module.exports = setupWasi

The Example

The last thing to do was write the example. Here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const setupWasi = require('./setup-wasi')

async function main() {
// Setup the WASI instance.
const wasi = await setupWasi('./string-example.wasm')

// Get the functions exported from the WebAssembly
const {
countLetters,
sayHelloWorld,
sayHelloWorldInMandarin
} = wasi.instance.exports

let buf1 = null
try {
const s1 = '犬 means dog'
// Convert the JavaScript string into a pointer in the WebAssembly
buf1 = wasi.wasiMemoryManager.convertFromString(s1)
// The Chinese character will take up more than one byte.
const l1 = countLetters(buf1.byteOffset)
console.log(`expected length ${s1.length} got ${l1} byte length was ${buf1.byteLength}`)
} finally {
// Free the pointer we created in WebAssembly.
wasi.wasiMemoryManager.free(buf1)
}

// Should log "Hello, World!" to the console
sayHelloWorld()
sayHelloWorldInMandarin()
}

main().then(() => console.log('Done'))

And it works! Here’s the output.

1
2
3
4
expected length 11 got 11 byte length was 14
Hello World
你好,世界!
Done

Thoughts

I realise node already has a WASI module, but I wanted to be able to
run the code in a browser. I would have preferred to use wasmer-js,
but I had problems getting it to work, and it felt like a bit of a
sledgehammer for the problem I was trying to solve.

Using libc with WebAssembly enabled DataFrames

Using libc with WebAssembly enabled DataFrames

This post describes how to use the C standard library with WebAssembly enabled
DataFrames. You can find the source code
here.

Introduction

In my previous
post
I avoided solving the problem of linking to standard
library to provide memory allocation for my WebAssembly project by writing my
own memory allocator. In order to use external C packages I will need a
standard library.

wsi-libc

The wasi-libc project does what it
says on the tin by providing a standard library for wasm.

I have changed my setup as my laptop required reinstallation. I’m now running
Ubuntu 20.04 LTS.

I installed the latest (version 10) LLVM tool chain as a Debian package which
I got from this page. I did the following.

1
2
wget https://apt.llvm.org/llvm.sh
sudo ./llvm.sh

That installed clang et al with the suffix “-10”, such that clang could be
found as /usr/bin/clang-10.

I installed wabt in /opt/wabt.

I cloned the wasi-libc library and built it as follows.

1
2
3
git clone git@github.com:WebAssembly/wasi-libc.git
cd wasi-libc
sudo WASM_CC=clang-10 WASM_NM=llvm-nm-10 WASM_AR=llvm-ar-10 INSTALL_DIR=/opt/wasi-libc make install

The C Source Code

In my previous post I had two source files: memory-allocation.c and array-methods.c.
We can delete memory-allocation.c as we’ll use malloc and free from the
standard library. I split he array methods into two files: array-methods-int.c
and array-methods-double.c, included stdlib.h and changed the memory allocation
to use the standard library.

Here is the start of array-methods-int.c.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <stdlib.h>

__attribute__((used)) int* addInt32Arrays (int *array1, int* array2, int length)
{
int* result = (int*) malloc(length * sizeof(int));
if (result == 0)
return 0;

for (int i = 0; i < length; ++i) {
result[i] = array1[i] + array2[i];
}

return result;
}

Then I changed the makefile as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SYSROOT=/opt/wasi-libc

LLVM_VERSION=10
CC=clang-$(LLVM_VERSION)
LD=wasm-ld-$(LLVM_VERSION)
CFLAGS=--target=wasm32-unknown-unknown-wasm --optimize=3 --sysroot=$(SYSROOT)
LDFLAGS=--export-all --no-entry --allow-undefined -L$(SYSROOT)/lib/wasm32-wasi -lc -lm

WASM2WAT=/opt/wabt/bin/wasm2wat

sources = array-methods-int.c array-methods-double.c
objects = $(sources:%.c=%.bc)
target = data-frame

.PHONY: all

all: $(target).wat

$(target).wat: $(target).wasm
$(WASM2WAT) $(target).wasm -o $(target).wat

$(target).wasm: $(objects)
$(LD) $(objects) $(LDFLAGS) -o $(target).wasm

%.bc: %.c
$(CC) -c -emit-llvm $(CFLAGS) $< -o $@

clean:
rm -f $(target).wasm
rm -f $(target).wat
rm -f $(objects)

The compilation stage node sets --sysroot. This appears to set up the include
paths correctly.

The link stage adds the path to the libraries -L$(SYSROOT)/lib/wasm32-wasi and
and includes libc and libm -lc -lm.

The JavaScript Code

All I needed to do was to change the memory allocation imports to use malloc and free.

Everything just worked!

As a test I added a unary function logFloat64Array. This required sme minor
changes to the javascript to allow ‘int’ type series to fall through to the double
methods. Again it all just worked.

Things To Do

I need to understand what the crt1.o file is all about. I tried to link, but
that gives me other errors.

The future of Data Science is JavaScript and WebAssembly

In my previous posts
[1]
[2]
[3]
[4]
I demonstrated how to write a simple pandas-style
data frame in JavaScript using WebAssembly (a near
native speed virtual machine available in every modern browser or server-side
engine such as nodejs).
This demonstrated the following features:

  • Elegant Syntax

    Expressions can be written in the following manner.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    const df = DataFrame.fromObject(
    [
    { col0: 'a', col1: 5, col2: 8.1 },
    { col0: 'b', col1: 6, col2: 3.2 }
    ],
    { col0: 'object', col1: 'int', col2: 'double'}
    )

    df['col3'] = df['col1'] + df['col2']
  • Near native speed.

    All operations are written in C and are performed in a compiled WebAssembly
    module.

  • Portable.

    The code runs in any browser or server side JavaScript engine (e.g.
    chrome, nodejs, etc.).

  • Uses standard tooling.

    The C code was compiled with the current standard
    clang
    compiler which supports WebAssembly
    out of the box.

The Future of Data Science?

Clearly this is a bold claim!

There are currently two clear platform leaders in the data science eco-system:
R,
and Python
using the scipy toolkit,
and also a number of attractive outsiders. The choice of which platform to
choose typically depends on language preference, but more compellingly what
packages a particular language/version provides. It is not unusual for a project
to be constrained to a legacy version of a platform simply because a key package
has not been ported to a current version.

So how can JavaScript and WebAssembly help?

Elegant Syntax

Surprisingly, yes! The posts referenced above clearly demonstrate how JavaScript
can be written in a manner which elegantly represents vector-arithmetic.

Speed

With WebAssembly there is no significant compromise in speed over natively
compiled languages.

Bindings

With other languages a binding must be written to integrate a package
into the language. In many cases this is completely unnecessary for
WebAssembly.

Fortran

In my opinion this is the killer advantage. There are a huge number of key data
science libraries implemented in many different flavours of Fortran. Binding to
these libraries is an ongoing and fraught task.

Happily LLVM (a language compiler supported by the majority
of operating systems) is currently incorporating
flang
(a Fortran compiler) into it’s toolchain, supporting all versions of Fortran
up to and including 2018. This provides the possibility of compiling any
Fortran package directly to WebAssembly.

The Catch

The first catch is operator overloading. I’m not sure if “catch” is the right
word. My solution using the babel transpiler works fine,
and transpiling is how React (one of the most popular web
frameworks, which uses custom syntax in JavaScript) does and will always work.
There is a proposal
to incorporate operator overloading into JavaScript (currently at stage 1 where
3 means acceptance), which leads me to believe it is a feature which my become
native to the language.

The second catch is the standard library. You’ll note if you’ve read the
previous posts that I had to provide a memory allocator as this is not available
in the WebAssembly environment. WebAssembly is just a virtual machine, and at
present it doesn’t come with the usual support code which most libraries depend
on. This is all being built out at the present in the form of
wasi, but it’s not here yet.

Lastly flang isn’t in the current stable release of LLVM. It was
incorporated in the repo in November 2019,
and I would expect it to be in the next major release.

Do I Want to Write in JavaScript?

Personally I don’t care what language I write in, as long as it doesn’t get in
the way of the problem I’m trying to solve. Because of the low cost of entry,
JavaScript has become ubiquitous, and so it’s level of documentation and support
is huge, which I like.

If it provides seamless access to the packages I need I’m a buyer.

Implementing DataFrames in JavaScript with Calculations in WebAssembly

This post describes a simple pandas style DataFrame implementation with the
array calculations performed in JavaScript.

The code for this post can be found here.

The DataFrame and Series Implementations

The series and data frame implementations are taken from my
previous post
with minor modifications.

Type awareness

The series (and therefore the data frame) will need to be aware of the type of
data they contain. For simplicity I have limited the types to:

  • 'int'
  • 'double'
  • 'object'

A series is constructed as follows.

1
const s1 = new Series('height', [1.82, 1.73, 1.69, 1.92], 'double')

The code for the series now looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import arrayMethods from './array-methods'

export class Series {
constructor (name, array, type) {
this.name = name
this.array = array
this.type = type

return new Proxy(this, {
get: (obj, prop, receiver) => {
if (prop in obj) {
// This is a known property: i.e. name, array or type.
return Reflect.get(obj, prop, receiver)
} else if (arrayMethods.has(prop)) {
// The property is an operator.
return (...args) => new Series('', ...arrayMethods.get(prop)(obj, ...args))
} else {
// The property requested is passed to the array.
return Reflect.get(obj.array, prop, receiver.array)
}
},
set: (obj, prop, value, receiver) => {
if (prop in obj) {
return Reflect.set(obj, prop, value, receiver)
} else {
return Reflect.set(obj.array, prop, value, receiver.array)
}
},
apply: (target, thisArgument, argumentList) => {
return Reflect.apply(target, thisArgument, argumentList)
},
// construct: Reflect.construct,
defineProperty: Reflect.defineProperty,
getOwnPropertyDescriptor: Reflect.getOwnPropertyDescriptor,
deleteProperty: Reflect.deleteProperty,
getPrototypeOf: Reflect.getPrototypeOf,
setPrototypeOf: Reflect.setPrototypeOf,
isExtensible: Reflect.isExtensible,
preventExtensions: Reflect.preventExtensions,
has: Reflect.has,
ownKeys: Reflect.ownKeys
})
}

toString () {
return `(${this.name};${this.type}): ${this.array.join(', ')}`
}
}

Comparing this to the
previous implementation
we can see three changes (plus a minor change to toString).

The constructor takes in an extra type parameter.

The operations are no longer defined in the class; they are now provided by the array-methods module.

1
import arrayMethods from './array-methods'

The last change is to modify the way the proxy finds the operator methods with arrayMethods.has(prop).

1
2
3
4
5
6
7
8
9
10
11
12
return new Proxy(this, {
get: (obj, prop, receiver) => {
if (prop in obj) {
return Reflect.get(obj, prop, receiver)
} else if (arrayMethods.has(prop)) {
return (...args) => new Series('', ...arrayMethods.get(prop)(obj, ...args))
} else {
return Reflect.get(obj.array, prop, receiver.array)
}
},
...
}

Here is the implementation of the DataFrame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import { Series } from './Series'

export class DataFrame {
constructor (series) {
this.series = {}
for (const item of series) {
this.series[item.name] = item
}

return new Proxy(this, {
get: (obj, prop, receiver) => {
return prop in obj ? Reflect.get(obj, prop, receiver) : Reflect.get(obj.series, prop, receiver.series)
},
set: (obj, prop, value, receiver) => {
if (prop in obj) {
Reflect.set(obj, prop, value, receiver)
} else {
value.name = prop
return Reflect.set(obj.series, prop, value, receiver.series)
}
},
apply: (target, thisArgument, argumentList) => {
return target in thisArgument ? Reflect.apply(target, thisArgument, argumentList) : Reflect.apply(target, thisArgument.array, argumentList)
},
defineProperty: Reflect.defineProperty,
getOwnPropertyDescriptor: Reflect.getOwnPropertyDescriptor,
deleteProperty: Reflect.deleteProperty,
getPrototypeOf: Reflect.getPrototypeOf,
setPrototypeOf: Reflect.setPrototypeOf,
isExtensible: Reflect.isExtensible,
preventExtensions: Reflect.preventExtensions,
has: Reflect.has,
ownKeys: Reflect.ownKeys
})
}

static fromObject (data, types) {
const series = {}
for (let i = 0; i < data.length; i++) {
for (const column in data[i]) {
if (!(column in series)) {
series[column] = new Series(column, new Array(data.length), types[column])
}
series[column][i] = data[i][column]
}
}
const seriesList = Object.values(series)
return new DataFrame(seriesList)
}

toString () {
const columns = Object.getOwnPropertyNames(this.series)
let s = columns.join(', ') + '\n'
const maxLength = Object.values(this.series)
.map(x => x.length)
.reduce((accumulator, currentValue) => Math.max(accumulator, currentValue), 0)
for (let i = 0; i < maxLength; i++) {
const row = []
for (const column of columns) {
if (i < this.series[column].length) {
row.push(this.series[column][i])
} else {
row.push(null)
}
}
s += row.join(', ') + '\n'
}
s += columns.map(column => this.series[column].type).join(', ') + '\n'
return s
}
}

The only changes are in the fromObject helper method which takes an extra
argument for the types and the toString to print the types. A DataFrame is now constructed as follows.

1
2
3
4
5
6
7
const df = DataFrame.fromObject(
[
{ col0: 'a', col1: 5, col2: 8.1 },
{ col0: 'b', col1: 6, col2: 3.2 }
],
{ col0: 'object', col1: 'int', col2: 'double'}
)

Array Methods

A new file has been added to create an ArrayMethods singleton. This is used
as a central store of array methods.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
class ArrayMethods {
constructor () {
if (!ArrayMethods.instance) {
this._methods = {}
ArrayMethods.instance = this
}

return ArrayMethods.instance
}

set (name, method) {
this._methods[name] = method
}

has (name) {
return name in this._methods
}

get (name) {
return this._methods[name]
}
}

const instance = new ArrayMethods()
Object.freeze(instance)

export default instance

The WebAssembly Calculations

The calculations have been written in C following the methods describe in
this post,
and this post.

Here is the code for addition.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
__attribute__((used)) int* addInt32Arrays (int *array1, int* array2, int length)
{
int* result = (int*) allocateMemory(length * sizeof(int));
if (result == 0)
return 0;

for (int i = 0; i < length; ++i) {
result[i] = array1[i] + array2[i];
}

return result;
}

__attribute__((used)) double* addFloat64Arrays (double* array1, double* array2, int length)
{
double* result = (double*) allocateMemory(length * sizeof(double));
if (result == 0)
return 0;

for (int i = 0; i < length; ++i) {
result[i] = array1[i] + array2[i];
}

return result;
}

Note how we need one function for integers and another for doubles.

There is a makefile to build the wasm.

Marshalling JavaScript to WebAssembly

I have tidied up the marshalling between JavaScript and WebAssembly. I created
a class to manage the wasm functions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
export class WasmFunctionManager {
constructor (memory, allocateMemory, freeMemory) {
this.memory = memory
this.allocateMemory = allocateMemory
this.freeMemory = freeMemory
}

createTypedArray (typedArrayType, length) {
const typedArray = new typedArrayType(
this.memory.buffer,
this.allocateMemory(length * typedArrayType.BYTES_PER_ELEMENT),
length)

if (typedArray.byteOffset === 0) {
throw new RangeError('Unable to allocate memory for typed array')
}

return typedArray
}

invokeUnaryFunction(func, array, typedArrayType) {
let input = null
let output = null

try {
input = this.createTypedArray(typedArrayType, array.length)
input.set(array)

output = new typedArrayType(
this.memory.buffer,
func(input.byteOffset, array.length),
array.length
)

if (output.byteOffset === 0) {
throw new RangeError('Failed to allocate memory')
}

const result = Array.from(output)

return result
} finally {
// Ensure the memory gets freed.
this.freeMemory(input.byteOffset)
this.freeMemory(output.byteOffset)
}
}

invokeBinaryFunction(func, lhs, rhs, typedArrayType) {
if (lhs.length !== rhs.length) {
throw new RangeError('Arrays must the the same length')
}
const length = lhs.length

let input1 = null
let input2 = null
let output = null

try {
input1 = this.createTypedArray(typedArrayType, length)
input2 = this.createTypedArray(typedArrayType, length)

input1.set(lhs)
input2.set(rhs)

output = new typedArrayType(
this.memory.buffer,
func(input1.byteOffset, input2.byteOffset, length),
length
)

if (output.byteOffset === 0) {
throw new RangeError('Failed to allocate memory')
}

const result = Array.from(output)

return result
} finally {
// Ensure the memory gets freed.
this.freeMemory(input1.byteOffset)
this.freeMemory(input2.byteOffset)
this.freeMemory(output.byteOffset)
}
}
}

This follows the ideas presented in the previous posts, but adds some error
checking, and a try ... finally clause to prevent memory leaks.

Setting up the operators

To set up the operators we first need to decide whether to use the 'int',
'double' or 'object' functions.

1
2
3
4
5
6
7
8
9
10
11
function chooseBestType(lhsType, rhsType) {
if (lhsType === 'int' && rhsType == 'int') {
return 'int'
} else if (
(lhsType === 'int' && rhsType === 'double') ||
(lhsType === 'double' && rhsType === 'int')) {
return 'double'
} else {
return 'object'
}
}

Once we have chosen the type we need to build a wrapper function to invoke the
appropriate method. Here is the helper that makes a binary operation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
function makeBinaryOperation(wasmFunctionManager, intFunc, doubleFunc, defaultFunc) {
return (lhs, rhs) => {
const bestType = chooseBestType(lhs.type, rhs.type)

if (bestType === 'int') {
const result = wasmFunctionManager.invokeBinaryFunction(
intFunc,
lhs.array,
rhs.array,
Int32Array
)
return [result, bestType]
} else if (bestType === 'double') {
const result = wasmFunctionManager.invokeBinaryFunction(
doubleFunc,
lhs.array,
rhs.array,
Float64Array
)
return [result, bestType]
} else {
const result = defaultFunc(lhs, rhs)
return [result, bestType]
}
}
}

Lastly we create the wasm instance and register the types. Here is an edited version which demonstrates registering the addition operator.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
export async function setupWasm () {
// Read the wasm file.
const buf = fs.readFileSync('./src-wasm/data-frame.wasm')

// Instantiate the wasm module.
const res = await WebAssembly.instantiate(buf, {})

// Get the memory exports from the wasm instance.
const {
memory,
allocateMemory,
freeMemory,

addInt32Arrays,
addFloat64Arrays,

// .. import the rest of the methods
} = res.instance.exports

const wasmFunctionManager = new WasmFunctionManager(memory, allocateMemory, freeMemory)

// Register the add method.
arrayMethods.set(
Symbol.for('+'),
makeBinaryOperation(
wasmFunctionManager,
addInt32Arrays,
addFloat64Arrays,
(lhs, rhs) => lhs.array.map((value, index) => value + rhs.array[index])
)
)

// register the rest of the methods ...
}

Running the code

We can run the code as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import { DataFrame } from './DataFrame'
import { setupWasm } from './setup-wasm'

function example () {
'operator-overloading enabled'

const df = DataFrame.fromObject(
[
{ col0: 'a', col1: 5, col2: 8.1 },
{ col0: 'b', col1: 6, col2: 3.2 }
],
{ col0: 'object', col1: 'int', col2: 'double'}
)
console.log(df.toString())
df['col3'] = df['col1'] + df['col2']
console.log(df.toString())
}

async function main () {
await setupWasm()

example()
}

// Run the async main function.
main().then(() => console.log('Done')).catch(error => console.error(error))

Thoughts

Clearly the DataFrame needs a lot of work. There is no concept of grouping,
indices, dates and times, etc. However we have a working proof on concept that provides a syntactically elegant and efficient implementation.

Simplifying WebAssembly C Memory Management by Using Clang Builtin's

Following on from my
previous post
regarding passing arrays between JavaScript
and WebAssembly, I have found out how to remove the need to call back into JavaScript

Prerequisites

These examples have been tested with nodejs v12.9.1,
clang 10.0.0,
and wabt 1.0.16 on Ubuntu 18.04.

I installed clang in /opt/clang, wabt in /opt/wabt, and I use
nodenv to manage my nodejs environment.

Update your path.

1
export PATH=/opt/clang/bin:/opt/wabt/bin:$PATH

What was the problem?

In order to find the size of the memory and grow it, I was having to import
functions from JavaScript. I now find there are two built in functions that
do exactly what I want:

  • __builtin_wasm_memory_size(0)
  • __builtin_wasm_memory_grow(0, blocks)

This means my growMoreMemory function now looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#define BLKSIZ 65536

static header_t* getMoreMemory(unsigned bytes_required)
{
// We need to add the header to the bytes required.
bytes_required += sizeof(header_t);

// The memory gets delivered in blocks. Ensure we get enough.
unsigned int blocks = bytes_required / BLKSIZ;
if (blocks * BLKSIZ < bytes_required)
blocks += 1;
unsigned int start_of_new_memory = __builtin_wasm_memory_size(0) * BLKSIZ;

if (__builtin_wasm_memory_grow(0, blocks) == -1)
return NULL;

long end_of_new_memory = __builtin_wasm_memory_size(0) * BLKSIZ;

// Create the block to insert.
header_t* block_to_insert = (header_t *) start_of_new_memory;
block_to_insert->value.size = end_of_new_memory - start_of_new_memory - sizeof(header_t);
block_to_insert->value.next = NULL;

// add to the free list
freeMemory((void *) (block_to_insert + 1));

return free_list;
}

The code

You can find the code here.

An example of a pandas style DataFrame in JavaScript

This post describes an example implementation of a pandas style DataFrame. See
here for the source
code.

The goal is to be able to do the following.

1
2
3
4
5
6
7
8
const df = DataFrame.fromObject(
[
{ 'col0': 'a', 'col1': 5, 'col2': 8.1 },
{ 'col0': 'b', 'col1': 6, 'col2': 3.2 }
]
)

df['col3'] = df['col1'] + df['col2']

In plain old JavaScript this is not possible. Let’s see how we can implement it.

The goal

So what is a DataFrame? To answer this we first need to look at a Series.

So What is a Series?

A Series is the building block of a DataFrame. I take a Series to be a
named vector (array), which will be initialized as follows.

1
let s1 = new Series('height', [1.82, 1.76, 1.72, 1.89])

The series should be indexable.

1
s1[0] === 1.82

The series should support vector arithmetic.

1
2
3
let height = new Series('height', [1.82, 1.76, 1.72, 1.89])
let weight = new Series('weight', [81.3, 73.2, 68.9, 92.1])
let score = height / weight

So What is a DataFrame?

I take a DataFrame to be a collection of Series.

1
2
3
4
let df = DataFrame([
height,
weight
])

The problems to solve

There are two problems to solve in order to implement this in JavaScript:

  • Operator overloading
  • Property accessing

Operator Overloading

We want to support vector arithmetic (multiplying two arrays). Unfortunately
JavaScript does not support operator overloading so we will have to pre-process
the code. We can do this with the babel transpiler
and a plugin. I’m using the
@jetblack/operator-overloading
plugin, which is a bag-of-bolts, but I wrote it so I know how it works!

Property accessing

In order for a series to have both a name and be indexable we need control over
the property accessing. We can do that with a
Proxy
object. The Proxy object provides a layer of indirection between requests on
the object, and the actions performed.

Setting up your environment

Lets write some code!

First install the node modules. I’m using babel, the operator overloading
plugin, and
standardjs as a linter and formatter.

1
2
3
4
5
6
7
8
# Initialise the package.json
npm init -y
# Install the babel tool chain.
npm install --save-dev @babel/core@7.10.1 @babel/preset-env@7.10.2 @babel/cli@7.10.1 @babel/node@7.10.1
# Install the operator overloading plugin.
npm install --save-dev git+https://github.com/rob-blackbourn/jetblack-operator-overloading.git#0.1.0
# Install standardjs for linting and formatting
npm install --save-dev babel-eslint@10.1.0 standard@14.3.4

We configure standardjs by editing the package.json and adding the following.

1
2
3
4
5
6
{
...
"standard": {
"parser": "babel-eslint"
}
}

If you are using using vscode create the .vscode/settings.json as follows
then restart vscode to start the standardjs server.

1
2
3
4
5
6
7
8
{
"javascript.validate.enable": false,
"standard.enable": true,
"standard.autoFixOnSave": true,
"[javascript]": {
"editor.formatOnSave": false
}
}

Configure babel by creating the .bablerc file with the usual preset and the
operator overloading plugin. The operator overloading plugin requires arrow
functions. Targeting node (or any modern browser) achieves this.

1
2
3
4
5
6
7
8
9
10
{
"presets": [
["@babel/preset-env", {
"targets": {
"node": "current"
}
}]
],
"plugins": ["module:@jetblack/operator-overloading"]
}

Edit the package.json and add some scripts for building the transpiled code
and running it.

1
2
3
4
5
6
7
8
{
...
"scripts": {
"build": "babel src --out-dir=dist --source-maps",
"start": "node dist/main.js"
},
...
}

The Implementation

Put the code in a subdirectory called src.

Series Code

In the src directory create a file called Series.js with the following
content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
export class Series {
constructor (name, array) {
this.name = name
this.array = array

return new Proxy(this, {
get: (obj, prop, receiver) => {
if (prop in obj) {
return Reflect.get(obj, prop, receiver)
} else {
return Reflect.get(obj.array, prop, receiver.array)
}
},
set: (obj, prop, value, receiver) => {
if (prop in obj) {
return Reflect.set(obj, prop, value, receiver)
} else {
return Reflect.set(obj.array, prop, value, receiver.array)
}
},
apply: (target, thisArgument, argumentList) => {
return Reflect.apply(target, thisArgument, argumentList)
},
defineProperty: Reflect.defineProperty,
getOwnPropertyDescriptor: Reflect.getOwnPropertyDescriptor,
deleteProperty: Reflect.deleteProperty,
getPrototypeOf: Reflect.getPrototypeOf,
setPrototypeOf: Reflect.setPrototypeOf,
isExtensible: Reflect.isExtensible,
preventExtensions: Reflect.preventExtensions,
has: Reflect.has,
ownKeys: Reflect.ownKeys
})
}

[Symbol.for('+')] (other) {
return new Series(
`${this.name}+${other.name}`,
this.array.map((value, index) => value + other.array[index])
)
}

[Symbol.for('-')] (other) {
return new Series(
`${this.name}-${other.name}`,
this.array.map((value, index) => value - other.array[index])
)
}

[Symbol.for('*')] (other) {
return new Series(
`${this.name}*${other.name}`,
this.array.map((value, index) => value * other.array[index])
)
}

[Symbol.for('/')] (other) {
return new Series(
`${this.name}/${other.name}`,
this.array.map((value, index) => value / other.array[index])
)
}

toString () {
return `(${this.name}): ${this.array.join(', ')}`
}
}

The code can be split into two parts.

The constructor returns a Proxy
object, which intercepts calls to the Series. It first checks if the property
or function is provided by the Series itself. If not it delegates the action
to the array. Note how the Proxy is returned from the constructor; this is a
poorly documented feature.

The operators are provided by the [Symbol.for('+')] methods.

DataFrame Code

In the src directory create a file called DataFrame.js with the following
content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import { Series } from './Series'

export class DataFrame {
constructor (series) {
this.series = {}
for (const item of series) {
this.series[item.name] = item
}

return new Proxy(this, {
get: (obj, prop, receiver) => {
return prop in obj ? Reflect.get(obj, prop, receiver) : Reflect.get(obj.series, prop, receiver.series)
},
set: (obj, prop, value, receiver) => {
return prop in obj ? Reflect.set(obj, prop, value, receiver) : Reflect.set(obj.series, prop, value, receiver.series)
},
apply: (target, thisArgument, argumentList) => {
return target in thisArgument ? Reflect.apply(target, thisArgument, argumentList) : Reflect.apply(target, thisArgument.array, argumentList)
},
defineProperty: Reflect.defineProperty,
getOwnPropertyDescriptor: Reflect.getOwnPropertyDescriptor,
deleteProperty: Reflect.deleteProperty,
getPrototypeOf: Reflect.getPrototypeOf,
setPrototypeOf: Reflect.setPrototypeOf,
isExtensible: Reflect.isExtensible,
preventExtensions: Reflect.preventExtensions,
has: Reflect.has,
ownKeys: Reflect.ownKeys
})
}

static fromObject (data) {
const series = {}
for (let i = 0; i < data.length; i++) {
for (const column in data[i]) {
if (!(column in series)) {
series[column] = new Series(column, new Array(data.length))
}
series[column][i] = data[i][column]
}
}
const seriesList = Object.values(series)
return new DataFrame(seriesList)
}

toString () {
const columns = Object.getOwnPropertyNames(this.series)
let s = columns.join(', ') + '\n'
const maxLength = Object.values(this.series)
.map(x => x.length)
.reduce((accumulator, currentValue) => Math.max(accumulator, currentValue), 0)
for (let i = 0; i < maxLength; i++) {
const row = []
for (const column of columns) {
if (i < this.series[column].length) {
row.push(this.series[column][i])
} else {
row.push(null)
}
}
s += row.join(', ') + '\n'
}
return s
}
}

This would be pretty short without the toString!

As with the Series class we use a Proxy object to control property
accessing.

I decided to keep the constructor clean; it just takes an array
of Series. However, in the real world we want a variety of constructors. The
convenience class method DataFrame.fromObject provides a way of building the series from
a list of objects.

Try it out

Let’s start with a series.

1
2
3
4
5
6
7
8
9
10
'operator-overloading enabled'

import { Series } from './Series'

const height = new Series('height', [1.82, 1.72, 1.64, 1.88])
const weight = new Series('weight', [81.4, 72.3, 69.9, 79.5])
const ratio = weight / height
console.log(ratio.toString())

> (weight/height): 44.72527472527473, 42.03488372093023, 42.6219512195122, 42.287234042553195

And let’s try using array methods to see if the request gets forwarded by the
proxy to the array.

1
2
3
4
5
6
7
import { Series } from './Series'

const s1 = new Series('numbers', [1, 2, 3, 4])
s1.push(5)
console.log(s1.toString())

> (numbers): 1, 2, 3, 4, 5

Finally let’s test the DataFrame.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
'operator-overloading enabled'

import { DataFrame } from './DataFrame'

const df = DataFrame.fromObject(
[
{ col0: 'a', col1: 5, col2: 8.1 },
{ col0: 'b', col1: 6, col2: 3.2 }
]
)
console.log(df.toString())
> col0, col1, col2
> a, 5, 8.1
> b, 6, 3.2

df['col3'] = df['col1'] + df['col2']
console.log(df.toString())
> col0, col1, col2, col3
> a, 5, 8.1, 13.1
> b, 6, 3.2, 9.2

Thoughts

Obviously there’s a lot in a pandas DataFrame not covered here, but I think
this demonstrates how the basic syntax could be achieved.

Passing arrays between wasm and JavaScript

In this post I’ll demonstrate some ways of passing arrays between JavaScript
and WebAssembly. The source code is here.

Prerequisites

These examples have been tested with nodejs v12.9.1,
clang 10.0.0,
and wabt 1.0.16 on Ubuntu 18.04.

I installed clang in /opt/clang, wabt in /opt/wabt, and I use
nodenv to manage my nodejs environment.

Update your path.

1
export PATH=/opt/clang/bin:/opt/wabt/bin:$PATH

No Hello, World?

Sadly the typically hello world example is tricky for WebAssembly; so we’ll do
a traditional “add two numbers” instead.

Write the C code

Start by writing the C code in the file example1.c

1
2
3
__attribute__((used)) int addTwoInts (int a, int b) {
return a + b;
}

This looks pretty vanilla apart from __attribute__((used)). This appears to
mark the function as something that can be exported from the module.

Build the wasm module.

Now lets compile.

1
2
3
4
5
6
7
8
clang example1.c \
--target=wasm32-unknown-unknown-wasm \
--optimize=3 \
-nostdlib \
-Wl,--export-all \
-Wl,--no-entry \
-Wl,--allow-undefined \
--output example1.wasm

So many flags! Lets go through them.

  • --target=wasm32-unknown-unknown-wasm tells the compiler/linker to produce
    wasm.
  • --optimize=3 seems to ne necessary to produce valid wasm. I don’t know why,
    and I might be wrong.
  • -nostdlib tells the compiler/linker that we don’t have a standard library,
    which is very sad.
  • -Wl,--export-all tells the linker to export anything it can.
  • -Wl,--no-entry tells the linker that there’s no main; this is just a library.
  • -Wl,--allow-undefined tells the linked to allow the code to access functions
    and variables that have not been defined. We’ll need to provide them when we
    instantiate the WebAssembly instance. This won’t be used in this example, but
    we’ll need it later.
  • --output example1.wasm does what it says on the tin.

If all went well you now have an example.wasm file.

Write the JavaScript

Write a JavaScript file example1.js with the following content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const fs = require('fs')

async function main() {
// Read the wasm file.
const buf = fs.readFileSync('./example1.wasm')

// Create a WebAssembly instance from the wasm.
const res = await WebAssembly.instantiate(buf, {})

// Get the function to call.
const { addTwoInts } = res.instance.exports

// Call the function.
const a = 38, b = 4
const result = addTwoInts(a, b)
console.log(`${a} + ${b} = ${result}`)
}

main().then(() => console.log('Done'))

The code comments should make it pretty clear whats happening here. We read the
compiled wasm into a buffer, instantiate the WebAssembly, get the function, run
it, and print out the result.

The WebAssembly.instantiate function is a promise, and I have chosen to use
the await syntax to make the code more readable, so I make an async main
function which I call as a promise on the last line.

Lets run it.

1
node example1.js

If all went well you should see the following.

1
2
38 + 4 =  42
Done

What just happened?

The bit I want to talk about here is how the wasm got instantiated. The first
argument to WebAssembly.instantiate was the buf holding the wasm. The second
was an empty importObject. The importObject can configure the properties
of the instance it creates. This might include the memory it uses, a table of
function references, imported and exported functions, etc. So why does an empty
object work?

We can look at the WebAssembly text (or wat) with the wabt tool wasm2wat.
To produce this we need to do the following.

1
wasm2wat example2.wasm -o example2.wat

The example2.wat file should look like this (edited for readability).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
(module
(type (;0;) (func))
(type (;1;) (func (param i32 i32) (result i32)))
(func $__wasm_call_ctors (type 0))
(func $addTwoInts (type 1) (param i32 i32) (result i32)
local.get 1
local.get 0
i32.add)
(table (;0;) 1 1 funcref)
(memory (;0;) 2)
(global (;0;) (mut i32) (i32.const 66560))
(global (;1;) i32 (i32.const 1024))
(export "memory" (memory 0))
(export "addTwoInts" (func $addTwoInts))

Near the top you can see our addTwoInts function with a satisfyingly small
amount of code which is almost understandable. Then there are mentions of
table and memory. At the end wel can see memory exported along with our
function.

What has happened is that the clang tool-chain has generated a bunch of stuff
that we would otherwise need to put in the importObject. You can switch this
off (see here) and control everything
from the JavaScript side, but I quite like it.

Passing arrays to wasm

Now we’ve established our tools work, and seen some of the features clang
provides we can get to arrays.

Write the C code

Write a file example2.c with the following contents.

1
2
3
4
5
6
7
8
9
__attribute__((used)) int sumArrayInt32 (int *array, int length) {
int total = 0;

for (int i = 0; i < length; ++i) {
total += array[i];
}

return total;
}

Compile it to wasm as we did before.

Write the JavaScript

Write a javascript file call example2.js with the following contents.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
const fs = require('fs')

async function main() {
// Load the wasm into a buffer.
const buf = fs.readFileSync('./example2.wasm')

// Instantiate the wasm.
const res = await WebAssembly.instantiate(buf, {})

// Get the function out of the exports.
const { sumArrayInt32, memory } = res.instance.exports

// Create an array that can be passed to the WebAssembly instance.
const array = new Int32Array(memory.buffer, 0, 5)
array.set([3, 15, 18, 4, 2])

// Call the function and display the results.
const result = sumArrayInt32(array.byteOffset, array.length)
console.log(`sum([${array.join(',')}]) = ${result}`)

// This does the same thing!
if (result == sumArrayInt32(0, 5)) {
console.log(`Memory is an integer array starting at 0`)
}
}

main().then(() => console.log('Done'))

The first part is the same as the previous example.

Then we create the array.

1
2
const array = new Int32Array(memory.buffer, 0, 5)
array.set([3, 15, 18, 4, 2])

The wasm instance has a memory buffer which is exposed to JavaScript as
an ArrayBuffer in memory.buffer. We create our Int32Array using the
first available “memory location” 0, and specify that it is 5
integers in length. Note the memory location is in terms of bytes, and
the length in terms of integers. The set method copies the JavaScript
array into the memory buffer.

Then we call the function.

1
const result = sumArrayInt32(array.byteOffset, array.length)

We pass in the offset in bytes to the array in the memory buffer and
the length (in integers) of the array. This gets passed to the function
we wrote in C.

1
int sumArrayInt32 (int *array, int length)

The result gets passed back as an integer.

Passing output arrays to wasm

This gives us some of what we want, but what if we want to get an array
back from wasm? The first approach is to pass an output array.

Write the C code

Write a file example3.c with the following contents.

1
2
3
4
5
6
__attribute__((used)) void addArraysInt32 (int *array1, int* array2, int* result, int length)
{
for (int i = 0; i < length; ++i) {
result[i] = array1[i] + array2[i];
}
}

The function takes in three arrays, multiplying each element of the two
input arrays and storing the output in the result array. Nothing need be
returned.

Write the JavaScript

Write a file called example3.js with the following content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
const fs = require('fs')

async function main() {
// Load the wasm into a buffer.
const buf = fs.readFileSync('./example3.wasm')

// Make an instance.
const res = await WebAssembly.instantiate(buf, {})

// Get function.
const { addArraysInt32, memory } = res.instance.exports

// Create the arrays.
const length = 5

let offset = 0
const array1 = new Int32Array(memory.buffer, offset, length)
array1.set([1, 2, 3, 4, 5])

offset += length * Int32Array.BYTES_PER_ELEMENT
const array2 = new Int32Array(memory.buffer, offset, length)
array2.set([6, 7, 8, 9, 10])

offset += length * Int32Array.BYTES_PER_ELEMENT
const result = new Int32Array(memory.buffer, offset, length)

// Call the function.
addArraysInt32(
array1.byteOffset,
array2.byteOffset,
result.byteOffset,
length)

// Show the results.
console.log(`[${array1.join(", ")}] + [${array2.join(", ")}] = [${result.join(", ")}]`)
}

main().then(() => console.log('Done'))

We can see some differences here in the way we create the arrays. The
code to create the first array looks the same, but what’s going on for
the others?

1
2
offset += length * Int32Array.BYTES_PER_ELEMENT
const array2 = new Int32Array(memory.buffer, offset, length)

Our first example was easy, as we only had one array. When we create
subsequent arrays we need to lay them out in memory such that they don’t
overlap each other. To do this we calculate the number of bytes required
with length * Int32Array.BYTES_PER_ELEMENT and add it to the previous
offset to ensure each array has it’s own space. Note again how the offset
is in bytes, but the length is in units of integers.

Next we call the function.

1
addArraysInt32(array1.byteOffset, array2.byteOffset, result.byteOffset, length)

This maps on to the prototype of our C implementation.

1
void addArraysInt32 (int *array1, int* array2, int* result, int length)

Now we can get array data out of wasm!

But wait, something is missing. What if we want to create an array
inside the wasm module and pass it out? What if our array calculation
needs to create it’s own arrays in order to support the calculation?

To do that we need to provided some memory management.

Understanding wasm memory management

As we have seen, wasm memory is managed in JavaScript through the
WebAssembly.Memory
object. This has a buffer which is an ArrayBuffer. The buffer has a byteLength property, which gives us the
upper bound of the memory.

It also has a grow method which can be used to get more memory.

I couldn’t find any documentation on how to get to this information from
inside the wasm instance, but we can solve this problem by calling back
to the JavaScript from wasm.

Calling JavaScript from wasm

We can call JavaScript from wasm by importing a function when we
instantiate the wasm module.

Write the C code

Write a file example4.c with the following contents.

1
2
3
4
5
6
extern void someFunction(int i);

__attribute__((used)) void callSomeFunction (int i)
{
someFunction(i);
}

Here declare an external function someFunction that will be
provided by JavaScript, and then export callSomeFunction that, when
invoked, will run the imported function.

Write the JavaScript

Write a file called example4.js with the following content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
const fs = require('fs')

async function main() {
// Load the wasm into a buffer.
const buf = fs.readFileSync('./example4.wasm')

// Make the instance, importing a function called someFunction which
// logs its arguments to the console.
const res = await WebAssembly.instantiate(buf, {
env: {
someFunction: function (i) {
console.log(`someFunction called with ${i}`)
}
}
})

// Get the exported function callSomeFunction from the wasm instance.
const { callSomeFunction } = res.instance.exports

// Calling the function should call back into the function we imported.
callSomeFunction(42)
}

main().then(() => console.log('Done'))

Rather than passing in an empty object to WebAssembly.instantiate as we
did previously, we now pass an object with an env property. This
contains a property someFunction with it’s implementation (it just
logs its argument to the console).

Calling the function callSomeFunction calls back to the function we
provided to the wasm module someFunction.

Now we have a way of the wasm module finding out how much memory it has
and asking for more.

Passing arrays from wasm to JavaScript

First we need some C code to perform memory management. I wrote a trivial
implementation of malloc which can be found here.
Copy this file to the work folder as memory-allocation.c The code provides
three functions.

1
2
3
void *allocateMemory (unsigned bytes_required);
void freeMemory(void *memory_to_free);
double reportFreeMemory()

Obviously it’s stupid to write your own malloc. Rob bad.

Write the C code

Write a file example5.c with the following contents.

1
2
3
4
5
6
7
8
9
10
11
12
extern void *allocateMemory(unsigned bytes_required);

__attribute__((used)) int* addArrays (int *array1, int* array2, int length)
{
int* result = allocateMemory(length * sizeof(int));

for (int i = 0; i < length; ++i) {
result[i] = array1[i] + array2[i];
}

return result;
}

The code is similar to the previous example, but instead of taking in a result
array, it creates and returns the array itself.

Write the JavaScript

Write a file called example5.js containing the following code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
const fs = require('fs')

/*
* A simple memory manager.
*/
class MemoryManager {
// The WebAssembly.Memory object for the instance.
memory = null

// Return the buffer length in bytes.
memoryBytesLength() {
return this.memory.buffer.byteLength
}

// Grow the memory by the requested blocks, returning the new buffer length
// in bytes.
grow(blocks) {
this.memory.grow(blocks)
return this.memory.buffer.byteLength
}
}

async function main() {
// Read the wasm file.
const buf = fs.readFileSync('./example5.wasm')

// Create an object to manage the memory.
const memoryManager = new MemoryManager()

// Instantiate the wasm module.
const res = await WebAssembly.instantiate(buf, {
env: {
// The wasm module calls this function to grow the memory
grow: function(blocks) {
memoryManager.grow(blocks)
},
// The wasm module calls this function to get the current memory size.
memoryBytesLength: function() {
memoryManager.memoryBytesLength()
}
}
})

// Get the memory exports from the wasm instance.
const {
memory,
allocateMemory,
freeMemory,
reportFreeMemory
} = res.instance.exports

// Give the memory manager access to the instances memory.
memoryManager.memory = memory

// How many free bytes are there?
const startFreeMemoryBytes = reportFreeMemory()
console.log(`There are ${startFreeMemoryBytes} bytes of free memory`)

// Get the exported array function.
const {
addArrays
} = res.instance.exports

// Make the arrays to pass into the wasm function using allocateMemory.
const length = 5
const bytesLength = length * Int32Array.BYTES_PER_ELEMENT

const array1 = new Int32Array(memory.buffer, allocateMemory(bytesLength), length)
array1.set([1, 2, 3, 4, 5])

const array2 = new Int32Array(memory.buffer, allocateMemory(bytesLength), length)
array2.set([6, 7, 8, 9, 10])

// Add the arrays. The result is the memory pointer to the result array.
result = new Int32Array(
memory.buffer,
addArrays(array1.byteOffset, array2.byteOffset, length),
length)

console.log(`[${array1.join(", ")}] + [${array2.join(", ")}] = [${result.join(", ")}]`)

// Show that some memory has been used.
pctFree = 100 * reportFreeMemory() / startFreeMemoryBytes
console.log(`Free memory ${pctFree}%`)

// Free the memory.
freeMemory(array1.byteOffset)
freeMemory(array2.byteOffset)
freeMemory(result.byteOffset)

// Show that all the memory has been released.
pctFree = 100 * reportFreeMemory() / startFreeMemoryBytes
console.log(`Free memory ${pctFree}%`)
}

main().then(() => console.log('Done'))

That’s a lot of code!

Let’s get started with the memory management. The script starts declaring a
MemoryManagement class.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class MemoryManager {
// The WebAssembly.Memory object for the instance.
memory = null

// Return the buffer length in bytes.
memoryBytesLength() {
return this.memory.buffer.byteLength
}

// Grow the memory by the requested blocks, returning the new buffer length
// in bytes.
grow(blocks) {
this.memory.grow(blocks)
return this.memory.buffer.byteLength
}
}

Once the memory property has been set it can provide the length in bytes of
the memory, and it can grow the memory for a given number of blocks returning
the new memory length in bytes.

We pass the functions when creating the wasm instance, and assign the memory
object once to instance is created.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const memoryManager = new MemoryManager()

const res = await WebAssembly.instantiate(buf, {
env: {
grow: function(blocks) {
memoryManager.grow(blocks)
},
memoryBytesLength: function() {
memoryManager.memoryBytesLength()
}
}
})

memoryManager.memory = res.instance.exports.memory

When we create the arrays we use the memory allocator.

1
const array1 = new Int32Array(memory.buffer, allocateMemory(bytesLength), length)

The memory gets freed with the complimentary function near the end of the
script.

1
freeMemory(array1.byteOffset)

Finally we can call our function and get the results.

1
2
3
4
result = new Int32Array(
memory.buffer,
addArrays(array1.byteOffset, array2.byteOffset, length),
length)

Note that what gets returned is the point in memory where the array has been
stored, so we need to wrap it in an Int32Array object. And don’t forget to
free it!

Outro

Well that was a lot of code.

Obviously we can wrap this all up in some library code to hide the complexity,
but what have we gained. I can’t say for sure, but if the array calculations
run at near native speed this could provide enormous performance gains.

I could (and probably should) have used a libc implementation instead of
implementing malloc. I decided not to because I think it kept the focus on
the array passing problem, and not on integrating with libraries, which is a
whole other discussion.

Good luck with your coding.