Current issue

Vol.26 No.4

Articles in press
Full index

Volumes

26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

© 1984-2024
British APL Association
All rights reserved.

Archive articles posted online on request: ask the archivist.

archive/26/1

Volume 26, No.1

Compiling APL to JavaScript

Nick Nickolov (nick.nickolov@gmail.com)

This article is about github.com/ngn/apl[1], an APL to JavaScript compiler written in CoffeeScript. It gives an overview of the project and its dialect of APL, then proceeds to describe three major implementational obstacles and the design decisions taken to overcome them.

Introduction

The list of languages that compile to JavaScript is growing steadily for a good reason: almost every modern consumer device can run a browser with a JavaScript interpreter in it. And for the server there’s NodeJS. As a language, JavaScript has its subset of good parts and if you restrict your coding to them, it’s actually a very decent language. The execution model is reminiscent of Lisp with its lambdas and lexical closures, despite the superficially C-like syntax. As a matter of fact, getting rid of those curly braces, semicolons, and other noise is easy thanks to CoffeeScript, which gives a comfortable layer of syntactic sugar over raw JavaScript.

I started an open source project for an APL compiler targeting JavaScript. Though incomplete and of poor performance, it’s good enough for experimenting with short programs, such as the Game of Life or the N Queens problem.

I made a conscious decision to deviate from the APL tradition in several ways.

Function definition

Lambdas are supported (function literals enclosed in curly braces, a.k.a. dfns) but the del (∇) syntax for function definition is not. So, instead of

    ∇R←A f B
        R←...
    ∇

one is forced to write

    f←{...}

Variable scoping

Scoping is always lexical. This means that the two occurrences of a below

    f←{a←123}
    g←{a←456}
    h←{...}

are different variables and neither of them is accessible from within h, regardless of the order and nesting of f, g, h invocations.

A variable is considered to belong to the outermost ancestor scope where it is mentioned. So, the first two occurrences of a below are the same variable and the third a is distinct from them:

    f←{
        a←123
        g←{a←456}
    }
    h←{a←789}

Note that assignment doesn’t necessarily create a local variable. If the variable is already present in any enclosing scope, assignment will set that variable.

A variable must not be used before it is assigned a value. The compiler can detect most violations of that:

    ⎕←⍳ a      ⍝ compiler error
    a←5

Phrasal forms

Expressions consisting of a sequence of two or three functions are said to form a hook or a fork respectively and the result is a new function defined as follows:

     (fg)⍵    ←→   ⍵fg⍵
    ⍺(fg)⍵    ←→   ⍺fg⍵
     (fgh)⍵   ←→   (f⍵)g(h⍵)
    ⍺(fgh)⍵   ←→   (⍺f⍵)g(⍺h⍵)

Examples usages of phrasal forms are the arithmetic mean written as +/ ÷ ⍴ and continued fraction evaluation as (+÷)\.

Non-privileged primitives

Primitives are just like any other variable, only as if implicitly defined in the root scope. As a side effect, they can be overridden:

    3⍟4          ⍝ returns 1.2618595071429148
    ⍟←{⍺+2×⍵}
    3⍟4          ⍝ returns 11

One use of this fact might be to introduce comparison tolerance, which isn’t supported by default:

    ⎕CT←1e¯13
    =←{(|⍺-⍵)≤⎕CT×(|⍺)⌈|⍵}

It’s okay to use single non-alphabetic characters as variable names, so the language can be augmented as one sees fit:

    π←○1
    √←{⍵*.5}
    ∑←+/
    ½←{⍵÷2}
    ±←+,-
    €←{exchangeRate×⍵}
    ≈←{(|⍺-⍵)≤⎕CT×(|⍺)⌈|⍵}

Index origin fixed to 0

I took a side in this Holy War.

    ⍳3      ⍝ returns 0 1 2
    ⎕IO←0   ⍝ ok, no effect
    ⎕IO←1   ⍝ error

Line terminator ambiguity

A line terminator is treated as a statement separator if it occurs within {} or at the topmost level, but is considered whitespace if it’s inside () or [].

Example:

Absence of control structures

Control structures such as :If ... :Else ... :EndIf are not supported but can be emulated using lambdas, guards, and APL primitives. For instance

    :If x
        y
    :Else
        z
    :EndIf

can be replaced with

    {x:y⋄z}0

and

    :While x
        y
    :EndWhile

with

    {~x:1⋄y⋄0}⍣⊢ 0

The rest of this article describes three challenging problems which I came across during the implementation of APL. Two of them I solved and the third one is yet to be taken care of.

Parsing APL expressions

Consider the APL expression

    a b c

Depending on context, it could mean different things:

    a←0  ⋄ b←{} ⋄ c←0  ⋄ a b c    ⍝ a dyadic application of b
    a←{} ⋄ b←{} ⋄ c←0  ⋄ a b c    ⍝ two nested monadic applications
    a←0  ⋄ b←0  ⋄ c←{} ⋄ a b c    ⍝ compiler error
    a←0  ⋄ b←0  ⋄ c←0  ⋄ a b c    ⍝ construction of a new vector
    a←{} ⋄ b←{} ⋄ c←{} ⋄ a b c    ⍝ a fork

So, it seems the semantics depend on what a, b, and c represent at runtime, verbs or nouns, and building a complete AST is impossible before we have that knowledge.

Fortunately, there exists a way around this at the cost of imposing a small restriction on variable usage. The restriction is as follows:

Once we assign a noun to a variable, we are only allowed to assign nouns to it later.
Once we assign a verb to a variable, we are only allowed to assign verbs to it later.

In other words, variables must preserve their lexical category.

Example:

    x ← 0
    x ← 1     ⍝ OK
    x ← {}    ⍝ Compiler error

    f ← {}
    f ← +/    ⍝ OK
    f ← 2 3   ⍝ Compiler error

This restriction separates the worlds of nouns and verbs at compile time. Every expression can then be inferred one of the two lexical categories so the compiler know what JavaScript code to render for it.

Representation of n-dimensional arrays

During the evolution of the project I went through a couple of different implementations of the basic APL data structure.

Nested representation

My first take on this was to use n nested levels of JavaScript arrays to represent an n-dimensional APL array. For instance:

    APL: 2 3⍴⍳6
    JS: [[0, 1, 2], [3, 4, 5]]

Thus, subscripting maps nicely between the two languages:

    APL: a[i;j;k]
    JS: a[i][j][k]

but many of the primitives’ implementations had to involve recursion or multiple loops, which apart from being inefficient was complicating matters unnecessarily, so I transitioned to a flat representation.

Flat representation

A flat representation is a single JavaScript array which contains all items of the APL array in lexicographic order of their indices (a.k.a. ravel order). Additional meta-information about the shape must be stored, too.

    APL: 2 3⍴⍳6
    JS:
        var a = [0, 1, 2, 3, 4, 5];
        a.shape = [2, 3];

This way, indexing is not as plain as before:

    APL: a[i;j;k]
    JS: a[
            i * a.shape[0] * a.shape[1] +
            j * a.shape[0] +
            k
        ]

but it’s still worth it, as the implementation of most primitives gets simpler and more efficient.

There turns out to be an even better data structure which I was lucky to come across.

Strided representation

In a strided representation, an APL array is a record of:

data: the elements, not necessarily in ravel order

shape: as usual, an integer array

stride: an integer for each axis, indicating how many items of data we have to skip over in order to move to an adjacent cell along that axis. Strides can be positive, negative, or zero.

offset: the position in data where element [0;0;...;0] is located

Example:

    APL: 2 3⍴⍳6
    JS:
        {
            data:   [0, 1, 2, 3, 4, 5],
            shape:  [2, 3],
            stride: [3, 1],
            offset: 0
        }

and this is another representation of the same APL array

    JS:
        {
            data:   [2, 1, 0, 5, 4, 3],
            shape:  [2, 3],
            stride: [3, -1],
            offset: 2
        }

Indexing is relatively straightforward:

    APL: a[i;j;k]
    JS:
        data[
            offset
            + i * stride[0]
            + j * stride[1]
            + k * stride[2]
        ]

The advantage of this strided representation becomes apparent when working with large volumes of data. Functions like transpose (⍉⍵), reverse (⌽⍵), or drop (⍺↓⍵) can reuse the data array and only care to give a new shape, stride, and offset to their result. A reshaped scalar, e.g. 1000000⍴0, can only occupy a constant amount of memory, exploiting the fact that strides can be 0.

Continuation-passing style

The third problem, one which I haven’t solved in code yet, arose from the way NodeJS does I/O. While in some cases NodeJS allows you to do

    var content = fs.readFileSync('a.txt');

it strongly encourages the so-called continuation-passing style (CPS), demanding a callback:

    fs.readFile('a.txt', function (content) {
        // ... Invoked later, after content becomes available
    });

It’s not only that. In a browser environment, if we want to be properly mimicking I/O, we again need CPS. Of course, we can use the blocking functions prompt() and alert(), but most users find those annoying because they prevent them from using other parts of the UI while entering text.

This is what happens when reading from ⎕, for instance:

APL code tries to get the value of variable ⎕.
The user is prompted to type something in an <input/>.
APL execution is suspended while the user is typing, looking up help information, using the on-screen keyboard, etc.
The user presses enter.
APL execution is resumed and the value obtained from ⎕ is processed.

The trouble is in step 3. How do we suspend execution to resume it later? We might as well try to make the compiler output CPS JavaScript code, but that appears get complicated rather quickly.

If only JavaScript had a call-with-current-continuation primitive like Scheme, we would be able to do just this:

    var content = callcc(fs.readFile.bind(fs, 'a.txt'));

but alas, it doesn’t. We can amend that through implementing a virtual machine (VM) in JavaScript with a callcc instruction that suspends execution, captures the current state with all its stack and variable bindings and presents it as a callable resume function. The compiler can then produce some intermediate bytecode for the VM instead of pure JavaScript.

What if we also expose callcc as an APL primitive? This would be so powerful that it could be used to implement an exception mechanism, break/continue, coroutines, generators, Prolog-style backtracking, etc in APL itself. Therefore, adding a VM looks like a lucrative prospect for ngn/apl.

ngnAPL https://github.com/ngn/apl