In plain old JavaScript this is not possible. Let’s see how we can implement it.
The goal
So what is a DataFrame? To answer this we first need to look at a Series.
So What is a Series?
A Series is the building block of a DataFrame. I take a Series to be a named vector (array), which will be initialized as follows.
1
let s1 = newSeries('height', [1.82, 1.76, 1.72, 1.89])
The series should be indexable.
1
s1[0] === 1.82
The series should support vector arithmetic.
1 2 3
let height = newSeries('height', [1.82, 1.76, 1.72, 1.89]) let weight = newSeries('weight', [81.3, 73.2, 68.9, 92.1]) let score = height / weight
So What is a DataFrame?
I take a DataFrame to be a collection of Series.
1 2 3 4
let df = DataFrame([ height, weight ])
The problems to solve
There are two problems to solve in order to implement this in JavaScript:
Operator overloading
Property accessing
Operator Overloading
We want to support vector arithmetic (multiplying two arrays). Unfortunately JavaScript does not support operator overloading so we will have to pre-process the code. We can do this with the babel transpiler and a plugin. I’m using the @jetblack/operator-overloading plugin, which is a bag-of-bolts, but I wrote it so I know how it works!
Property accessing
In order for a series to have both a name and be indexable we need control over the property accessing. We can do that with a Proxy object. The Proxy object provides a layer of indirection between requests on the object, and the actions performed.
Setting up your environment
Lets write some code!
First install the node modules. I’m using babel, the operator overloading plugin, and standardjs as a linter and formatter.
1 2 3 4 5 6 7 8
# Initialise the package.json npm init -y # Install the babel tool chain. npm install --save-dev @babel/core@7.10.1 @babel/preset-env@7.10.2 @babel/cli@7.10.1 @babel/node@7.10.1 # Install the operator overloading plugin. npm install --save-dev git+https://github.com/rob-blackbourn/jetblack-operator-overloading.git#0.1.0 # Install standardjs for linting and formatting npm install --save-dev babel-eslint@10.1.0 standard@14.3.4
We configure standardjs by editing the package.json and adding the following.
1 2 3 4 5 6
{ ... "standard":{ "parser":"babel-eslint" } }
If you are using using vscode create the .vscode/settings.json as follows then restart vscode to start the standardjs server.
Configure babel by creating the .bablerc file with the usual preset and the operator overloading plugin. The operator overloading plugin requires arrow functions. Targeting node (or any modern browser) achieves this.
The constructor returns a Proxy object, which intercepts calls to the Series. It first checks if the property or function is provided by the Series itself. If not it delegates the action to the array. Note how the Proxy is returned from the constructor; this is a poorly documented feature.
The operators are provided by the [Symbol.for('+')] methods.
DataFrame Code
In the src directory create a file called DataFrame.js with the following content.
static fromObject (data) { const series = {} for (let i = 0; i < data.length; i++) { for (const column in data[i]) { if (!(column in series)) { series[column] = newSeries(column, newArray(data.length)) } series[column][i] = data[i][column] } } const seriesList = Object.values(series) returnnewDataFrame(seriesList) }
toString () { const columns = Object.getOwnPropertyNames(this.series) let s = columns.join(', ') + '\n' const maxLength = Object.values(this.series) .map(x => x.length) .reduce((accumulator, currentValue) =>Math.max(accumulator, currentValue), 0) for (let i = 0; i < maxLength; i++) { const row = [] for (const column of columns) { if (i < this.series[column].length) { row.push(this.series[column][i]) } else { row.push(null) } } s += row.join(', ') + '\n' } return s } }
This would be pretty short without the toString!
As with the Series class we use a Proxy object to control property accessing.
I decided to keep the constructor clean; it just takes an array of Series. However, in the real world we want a variety of constructors. The convenience class method DataFrame.fromObject provides a way of building the series from a list of objects.