A First Look at the Causeway APL to C# Translator
Earlier this summer, I attended the first Causeway APL to C# translator seminar, where I learned a bit about the product and how to use it. As the product is in a rather early stage of its life, it would be futile if not inappropriate to document aspects of its usage. Rather, Ive opted for a narrative description of its initial capabilities, as these most certainly will be extended in the near future.
The notion of compiling APL has been around almost as long as APL itself. In the mainframe days, APL was thought by many, typically those not programming it, as an excessive consumer of otherwise limited and extremely expensive computer resources. The dream at the time was to compile APL simply to achieve execution speeds comparable to that of Fortran. What that really meant, no one was sure, as it was possible to write both slow Fortran and fast APL, nonetheless, the dream stuck. Over the years, starting in the early 1980s, mainframes became cheaper, and the need for severe reductions in computing costs became less apparent. The emphasis was more on the programmer productivity.
At the same time, the APL language itself was going through changes. APL2 became the evolving norm, and with its nested arrays, the topic of APL compilation had become less frequently visited. It was largely felt that the addition of nested arrays rendered the APL language truly uncompilable.
Nonetheless, some APL compilers, or more correctly translators, appeared, usually in the form of research projects. The APLc translator, by Professor Timothy Budd, successfully translated a subset of the APL language to C code (C++ was not widely available at the time). Although far from a commercial product, the APLc implementation was a clearly positive feasibility study for compiling APL.
Code generated by APLc was standalone as it required no existing APL environment the compiler itself was written in C, and C code emitted by the product would be compiled to an executable form in the normal manner.
Another successful compiler, this one commercially available in the mid 1980s, was the one offered by STSC as part of the vendors IBM VSAPL extensions. Here the compiler favoured scalar-oriented and iterative APL code. Through a batch process, it returned functions in the form of compiled modules. Only a subset of APL primitives was supported, calls to unimplemented functions were resolved by a call to the mainframe interpreter.
The STSC compiler depended on the APL environment not only was the compiler itself written in APL, but the compiled functions were materialised as a sort of magic function in the workspace.
Although the IBM personal computers were available from the early 1980s, it was not until the advent of the 386-class machines which came out around 1988, possible to comfortably migrate a small-to-medium sized mainframe APL application to the PC environment. The speed issue was largely forgotten, as PC applications tended to run faster than their mainframe counterparts, and Moores law was very much in session processor speed tended to double every 1½ years. It had become possible to buy your way outof performance problems by upgrading systems or adding memory.
Enter Sun and Microsoft
By the late 1990s, with the various flavours of Windows, from Windows 3.1 to Windows NT, Microsoft development platforms had become immensely popular. Notably, Visual Basic skyrocketed to popularity, with its comfortable development environment, easy language, and emphasis on the visual aspect of Windows programming. Similarly, Sun Microsystems Java language, developed as a general purpose Internet-oriented language, gained popularity in the non-Microsoft world. Internally, the Java language had been designed with the goal to be able to ship the compiled Java code, a form of P-code, to a Java language engine located in each machine. VBA was hybrid language implementation where true executable code would trampolineto parts of the VBA language engine as needed. In short, the Microsoft .Net platform evolved from the need to have a managed environment (MSIL, Microsofts internal .Net P-code) similar in concept to Java.
In reality, both the Java and .Net run-time engines are interpreters, or virtual machines, as theyre known in non-APL circles. Performance is typically less than that of a true compiled language such as C++. However, the language engine encapsulates many of the details of the underlying hardware, thus may be able to apply optimizations to p-code on the fly. More importantly, as hardware changes, a language engine based approach can insulate the user from compatibility issues, thus guaranteeing that code written a few years earlier runs on entirely different hardware.
.Net languages and Object Orientation
At this time, numerous .Net languages exist from Microsoft, the best known ones being VB.Net, J+, and C#. J+ is Microsofts implementation of Java; VB.Net is the successor to VB, as VB (and VBA, Visual Basic for Applications, found in the automation-enabled Microsoft products such as Word and Excel) is being deprecatedby Microsoft. As such, both VB.Net and C# are true object-oriented languages. VB.Net has been characterized as simply C# without the semicolons, whether this is true, we will concentrate on C#.
As its name would imply, C# would resemble C or C++. Certainly C# is a simplification of C++, as many of the mechanisms central to program development, such as memory allocation and other memory administration tasks, are silently handled by the C# language or .Net environment. Further, like C++, C# supports full object orientation. This combination of features makes C# a very APL-friendly target language with which to build a compiler.
To recap, Object Orientation is about the creation and management of specialised and application-specific data types (classes) and functions to deal with these new data types (methods). Object oriented languages normally contain the following features. Some of them are Big Words:
- Data Abstraction
Inheritance: Suppose you have a number of related classes which describe similar data types which differ in slight but important details. One class can inherit some or most of its functions from its parent, thus avoiding code duplication and work.
Polymorphism: Suppose you have a number of classes which should return some common entity, such as the size of the object or some kind of count. Yet the actual code to do this differs wildly from one object to another. By keeping the name of the function the same, one achieves polymorphism, i.e. a function that does the same sort of thing which exists in a number of places.
Data Abstraction: Suppose you want to implement complex numbers in your program. In reality, one complex number can be built using two floating point numbers, one for the real part, the other for the imaginary. Similarly, quaternions and octernions can be represented with vectors of floating point numbers. But from the perspective of the programmer actually using these data types, they look like one item.
Extensibility: With these features, it is possible to add extensions to at least the datatypes of the language.
Overloading: As the compiler keeps track of functions and their arguments, it is possible not only to use the name in different contexts, but to inform the compiler to allow the use of more familiar symbols as a substitute for functions. For example, the +symbol may be overloaded with the functionality of the plus() function.
The APL to C# Translator
Why would anyone want to compile APL?
- Speed, speed, and speed
- One-way migration to C#
- C# as a delivery platform
Speed: The performance issue is no longer what it used to be in the days of timesharing where a few extra functions added up to a huge bill. Nor is APL thought of as unreasonably expensive to run. Many applications are delivered in interpreted or semi-interpreted languages such as Visual Basic or Java, and complaints that something is much too slow are comparatively rare. Thus speed for speeds sake should not be the sole reason to consider this approach. Be that as it may, compilation to C#, which implicitly included the healing effects of a cleanup, may result in some speedup and it would be safe to say that overall, APL functions compiled to C# would be overall faster. Code optimisation is a very fertile area in which to expect considerable improvements over time.
One-way migration: The output of the APL to C# translator is designed to be as readable as possible, with user maintainability of the final output an important consideration. APL comments are propagated to the compiled code, the classes used in code is kept to a minimum, no code movement takes place, control structures are nearly identical in APL and C#, and the end result is code which looks at worst very familiar. Array and vector operations are handled by an array engine, and its use in application code is not only easy to understand, but documented. Thus it should be possible to turn compiled code over to a C# professional and let her take over.
C# delivery: A similar approach would be to continue development of a set of programs in APL, while delivering the end result in C#. As different parts of a product can be written in different .Net languages and linked together, this type of mixed language paradigm may be very efficient in certain circumstances or organisations. For example, GUI work may be handled by a different individual while the central computations of an application are done by the original APL developer. Clearly, the APL coding style may require discipline, but the code itself remains in APL. It is written in APL, tested and debugged in APL, then converted to C#. This module, combined with many others written in APL, C#, or VB.Net, contributes to the final application.
C# delivery is a convenient and compact alternative to application delivery using the run-time version of APL. Most importantly, the end result is comparatively small, which may make online or internet delivery of software that much quicker or more bearable. Interestingly, executable C# code in its initial form is relatively easily reverse engineered where an operating likeness of the original program can be easily reconstructed. Third-party products exist which obfuscate the executable code, rendering attempts at reverse engineering ineffective.
Although APL may be the preferred tool for thought, C# can be viewed as a tool for deployment. APL and C# have some irreconcilable differences which in the end require some care on the part of the APL programmer. The APL to C# translator does what it can to smooth out the differences between the two, but in the end, the APL developer is ultimately responsible for the quality of the compiled code.
Conceptually, the design of the translator is simply to take a line of APL code and convert it to C#. Line by line, the translator plods through the functions, and translates them verbatim. Depending on the arguments, for an ordinary addition, the translator inserts either A + B or AE.Plus(A, B)depending on whether arrays are involved. Code consisting of operations on scalars frequently compiles to standard C#. The actual parsing of the APL syntax is handled by a recursive descent parser using a subset of the J parse table, described in full detail in Vector 9.4 (page 92).
In order to support the multitude of APL primitive functions and operators, the translator draws upon an array engine to perform the operations. In essence, the array engine is a subroutine library with operations corresponding to the APL primitives. However, complication arises as variations of APL type, rank, and shape, each map to a different C# class.
At this time, the C# translator supports 15 different classes, which are every combination of Boolean, character, string, integer, and floating point, by rank 1, 2, and 3. Rank 0, of course, is scalar, and is supported by base C# types. It is important to note that every variation is a distinct class integer vectors are distinct from integer arrays which are distinct from floating point arrays. Arrays of rank greater than 3 are not supported, on account of the exponential growth of function overloading possibilities. This leads to the following distinct C# classes:
|Scalar||Rank 1||Rank 2||Rank 3|
It is precisely here where the beauty of the approach is apparent, where object orientation comes and does all the difficult work. Given the APL code fragment of
A + B
and given that both A and B are variables, the following possibilities exist:
|plus||bool 0||bool 1||bool 2||bool 3||int 0||int 1||int 2||int 3||flt 0||flt 1||flt 2||flt 3|
(Score: Native C#, 4 Array Engine, 140)
Needless to say, character data is outside the domain of addition and does not enter into the picture. If it did, we would have 16 squared possibilities, minus some of the scalar + scalar cases.
From the translators perspective, it has to choose between A+Band AE.Plus(A.B)
This is one of the outstanding features of object orientation the compiler keeps track of all the intermediate types and shapes (all implemented as different classes of objects) and knows exactly which one of the 140 versions of AE.Plus to deploy. C# has its own automatic type conversion rules which apply only to the scalar data types, otherwise, a version of an Array Engine function for every combination of the scalar and array types must be present. Needless to say, the developers of the Array Engine used a template-based approach to construct the library.
Control structures, introduced to the language in the last 15 years and adopted by at least two of the APL vendors, simplify code translation considerably. Control structures come out largely unchanged in the translation process, and their use contributes to the overall readability of the converted C# code. Standard APL branching is allowed, however some of the more complicated cases are not yet handled. Best to stick with simple branching.
In the compiling phase, the translator, where possible, translates the line of APL code into a line of C# verbatim. With scalar code, pure C# output is possible. With any APL arrays, regardless of rank, the array engine is invoked. Optimum readability is achieved with relatively short lines of APL code (long and skinnyfunctions) and use of control structures. Long lines of APL code begin to resemble Lisp when translated to C#, owing to the large number of parenthesis required when using the array engine with a long or complex expression.
Also, with C#, the Boolean data type is not represented with the values of 1 and 0, rather trueand false, and do not automatically map into 1 and 0 and one would expect from other languages, notably C++ and Visual Basic.
Nested Arrays are partially handled in the form of intermediate results. One recent Dyalog extension, to be able to split positional nested array arguments directly in the function header to their constituent variables, has been implemented in the translator. This feature implements one important language construct as a consequence of nested arrays.
Lastly, the dreaded Execute function is not supported as it is not appropriate in a compiled environment. ⎕VI and ⎕FI work as expected, though Dyalogs ⎕VFI function would need to be recoded to use ⎕VI and ⎕FI separately. Underscored variables (with the exception of delta-underbar) are not handled, and would need to be changed. User-defined traditional operators (but not in the form of dfns) are not supported.
One cannot reasonably expect to simply dump a bunch of functions into a sort of machine and have working C# code at the end. The APL code itself invariably will require some preparation and cleanup. C# is a strongly typed language, meaning that all variables in any program need to be declared to be of a given type. APL, of course, is anything but strongly typed, as it allows you to reuse variables, assigning them different types and/or shapes, not to mention functions and operators, and this cannot happen with C# programs. They will not compile. Better programming practice would suggest a certain discipline in the usage of variables variables never deviate from the types with which they were originally declared.
Another important design goal of the APL to C# translator is to structure language changes in such a way that not only the code continues to work both the APL and C# language environments. All translator directives and code declarations are supplied in the form of APL comments. As mentioned, the Dyalog APL implementation of ⎕VFI, a fusion of ⎕VI and ⎕FI which returns a nested result, is inappropriate. Alternately, ⎕VI and ⎕FI-like functions are supplied, which function correctly in APL and compile correctly to C#. Other changes are more subtle. The monadic plus function, defined in APL simply to return its argument, has been changed to promote its type. When executed in ordinary APL, the monadic plus has no effect on numeric arguments, and is thus harmless.
Variable declarations give the APL to C# translator a hint of what to do. Often, only a functions arguments and result need to be declared. By default, the translator treats arguments as integer scalar. With some data flow analysis, it is possible for the translator to deduce the type, shape, and rank of every variable at every point in a line of code. Thus for well-behaved programs, it may be sufficient to simply declare the arguments. Additional declaration may be supplied simply to force the result of a computation to be of an expected type.
Places to look out for, where type conversion may be happening, would be uses of the APL floor and ceiling functions. As implemented in ordinary APL, these functions do the double duty of rounding a number to the next lower or higher integer and where possible, returning an integer result from a floating point argument. As implemented by the translator, floor and ceiling do work as in traditional APL. Conversely, there may be situations where a floating point argument is required, but only an integer argument is supplied. The implementation of the monadic plus is to promote the argument from integer to floating point, or from Boolean to integer. Again, C#s implementation of the Boolean type is such that Boolean values are not interchangeable with 0 and 1. Scalar arguments are automatically promoted, at least from integer to floating point, by C#.
Another area where non-obvious but unexpected differences in APL and C# exist are in the implementation of characters vs. strings. Arrays of characters are no different than in APL. Strings, however, are immutable. That is, once created, they cannot be changed. C# favours strings for casual use of character data. As far as the translator is concerned, the use of strings vs. char is transparent to the user.
Other translator directives, all in the form of comments, indicate to the compiler whether to produce Public (visible outside of the class) or Private(hidden inside the class) functions, allow the a comment (often the first comment of a function) to appear in documentation, to indicate special treatment of multiple arguments
Using the Visual Studio program development facility allows, among other things, summary documentation to be viewed at the time code is being entered or edited. Potentially, this drastically improves the quality of the documentation available to the programmer at development time. Suppose several APL functions appeared as overloads, meaning different versions of the function which did the same thing but for different data types, Visual Studio would display the different possibilities.
At this time, I have used the APL to C# translator only on little things, meaning groups of 10 or fewer functions, and am truly impressed with the usability of the product. Ive found little problems which were promptly fixed. The test and code in APL, and deliver in C#, for me, is ideal. In any event, using this product has had the effect of breaking some very bad habits, such as variable name reuse, using variables which are too short, allowing functions to become too long, and overloading utilities. I have a number of bigthings on my plate and they all require a substantial amount of cleanup.
The cleanup effort can be daunting. My only suggestion is to start early, cleaning up as you can. Get rid of those executes, split up those large functions, get rid of branching and replace it with control structures.
Cleanup is quite a lot less work than Rewrite, which is quite a lot less work than Re-engineer. I had recycled a substantial amount of code before, but never had I thought that the code or ideas would make their way out of APL and into a mainstream computer language.
The computer software industry is very volatile things come and go, today certain software may be either the hottest in software fashion and competitive advantage, only to be forgotten tomorrow. At least couple of things are here to stay, at least a while, object orientation is one of them, APL is another.
Object orientation has been around in one form or another since 1970, with it catching on in the mainstream market in the early 1990s. APL has been available since the mid 1960s, on hardware ranging from prehistoric mainframes to pocket PCs. It is also interesting to note that in the mainstream computer language world, only two truly new ideas in computer languages have appeared since the introduction of Fortran in 1956 namely the notion of control structures, and object orientation. These two features also were key in making the APL translator and readable output code a possibility.
It is not obvious today whether .Net will have the staying power of some of the earlier technologies. One would reasonably expect that it will. However the APL to C# translation is a very portable concept which even today can be adapted to a variety of similar technologies. For example, Java would be a language which could be easily accommodated, doing this would substantially extend the reach of the product and more importantly, extend the reach outside of the Microsoft world. C++ is another potential target language with even greater far-reaching potential. .Net may fade into the software sunset, but whatever takes its place could have support for APL. Time marches on, so does computer science, and most certainly tomorrows platforms can be accommodated.
Modest speedups should be expected, as with a combination of efficient algorithms, high-powered optimisations built into the .Net engine, and APL language activities such as type and rank checking now done at compile time, the overall performance of an application should be better. However, to expect miraculousspeedups of a factor of 100 or more is not realistic. Isolated examples of stunning increases in speed for selected code fragments do exist, but largely, code performs the same or slightly better.
In closing, the APL to C# translator supplies an excellent migration path to a new world. The productivity of using APL, the tool for thought, and the ability to test an idea or program in an interpreted environment all remain intact. The resulting code can be shipped over to the .Net world, where it can now interact with everything else, in a mainstream environment.