# A Conversation with Manfred von Thun

Stevan Apter • sa@nsl.com

**SA**: *May I begin by asking you about the
origins of Joy? You come to programming language design by way of
philosophy. How exactly did that come about? Or are these
activities unrelated? *

**MvT**: The relation is mainly historical, and
certainly not logical. My first fifteen year period in academia,
first as a student and later as staff, was a comedy(?) of errors.
I enrolled in Psychology and Philosophy because "I wanted to
know how the human mind works". But rats and stats in
Psychology and logic and scientific method in Philosophy quickly
changed my directions. In particular, I became interested in the
justification of induction, and I wrote my PhD in the field of
inductive logic and logical probability. After my appointment
here at La Trobe I had to catch up with deductive logic and have
been teaching it at various levels. I have also taught courses in
philosophy of science, philosophy of psychology, cybernetics, and
of course various bits in first year philosophy. In 1978 two of
us logicians started using the university's new DEC10 computer.
My colleague soon settled on Snobol, and I on Pascal. We had no
help at all, and at that time for me there was just the Pascal
User Manual and Report and Wirth's *Algorithms +
Datastructures = Programs*. I was particularly fascinated by
the miniature compiler for the PL0 language. Most of my Symbolic
Programming in Pascal, which I wrote over quite a number of
years, was influenced by Wirth. I have given courses at all
levels for the Computer Science Department.

**SA**: *All your programming in Pascal was
procedural. So how did you get into functional programming? How
did Joy evolve? *

**MvT**: In the early 1980's I came across the
famous Backus paper "Can programming be liberated from the
von Neumann style," and I was immediately intrigued by the
higher level of programming in his FP. From my deductive logic I
also knew about Quine's predicate functors and about Tarski's
cylindric algebras. Like Schoenfinkel's and Curry's combinatory
logic, Backus elimiminates not only assignable variables but also
formal parameters, and Quine eliminates variables of
quantification. I designed a strange little logic programming
language based on some of these ideas, and implemented it in
Prolog. It had some rather useless features - among others a
"Wheatstone bridge" combinator which takes five binary
relations to produce a new binary relation.

Joy then evolved from this in an entirely haphazard way: First I restricted the binary relations to unary functions, and this of of course was a dramatic change. Second, to allow the usual arithmetic operations with their two arguments, I needed a place from which the arguments were to come and where the result was to be put - and the obvious place was a stack with a few shuffling combinators, originally the four inspired by Quine. Third, it became obvious that all these combinators could be replaced by unary functions, with only function composition remaining. Finally the very different distinctively Joy combinators emerged, which take one or more quoted programs from the stack and execute them in a specific way. Along the way of course, lists had already been seen as just special cases of quoted programs. This meant that programs could be constructed using list operations and then passed on to a Joy combinator.

**SA**: *So Joy emerged after a tortuous
history. But by now it is quite stable. How would you describe
its main features? *

**MvT**: The language Joy is a purely functional
programming language. Whereas all other functional programming
languages are based on the application of functions to arguments,
Joy is based on the composition of functions. All such functions
take just a stack as argument and produce a stack as value, but
there are a few which additionally take files and the file system
as argument and value. Because of the stack, much of Joy looks
like ordinary postfix notation. However, in Joy a function can
consume any number of parameters from the stack and leave any
number of results on the stack. Moreover, there are some stack
functions which shuffle the top few elements of the stack around.
So, although for example the program 2 3 + is an arithmetical
expression in postfix notation and also in Joy notation (leaving
5 on the stack), the Joy program 5 dup * (leaving 25) is not
arithmetic because dup is neither a number nor an arithmetic
operator. That is why I say that much of Joy "looks like
postfix". In reality this is what Billy Tanksley so
appropriately called "concatenative notation". The
semantics of this notation is summed up by: "The
concatenation of appropriate programs denotes the composition of
the functions which the programs denote". In such languages
the application operation of the lambda calculus is uniformly
replaced by composition.

What distinguishes Joy from (the functional subsets of) Forth and Postscript is the datatype of quoted programs. Many Joy functions expect quoted programs on top of the stack and execute them in different ways, effectively by dequoting. This is similar to Lisp's eval, but in Joy there are many of them, each performing the job of higher order functions. But whereas in other languages the higher order functions take abstractions as arguments, the combinators of Joy take quoted programs from the stack.

As a result, there are no named formal parameters, no substitution of actual for formal parameters, and no environment of name-value pairs. Consequently Joy has an exceptionally simple algebra, and its programs are easily manipulated by hand and by other programs. Many programs first construct another program which is then executed by one of the combinators.

**SA**: *In Joy, a primitive like + operates
on the stack. Should we think of this operation as replacing the
top two elements x y with their sum x+y, or as two successive
operations: first replace y with the unary function _+y and then
replace x with x+y? *

**MvT**: You spoke of "replacing" and
"and then", which are imperative or procedural notions
and hence have no place in a purely functional language. So the
official answer has to be: the program 2 3 + denotes the
composition of three functions, and the program 5 denotes one
function and it is identical with the composition of the other
three. How the addition is performed internally is a different
matter, it might be by pattern matching, or by two push
operations followed by an add (the obvious stack implementation),
or any other (invisible) method that gives the right result.

But you also said "Should we think...", and that requires a different answer. Indeed imperative or procedural thinking can be very useful for Joy, both for explaining the meaning of the Joy primitives and also when writing programs. Here one should be eclectic and pragmatic - use whatever works. I would tend to use your first mode of thinking, and only sometimes the second. Two other modes would use message passing: a) send the parameterless addition message to the pair 2 3, or b) send the addition message with parameter 3 to the number 2. Maybe Forth programmers know more about the psychology of stack programming.

**SA**: *You mention the combinators of Joy.
What exactly is a combinator, and how do those of Joy differ from
the combinators of Schonfinkel and Curry? *

**MvT**: Combinators are second (or even higher)
order functions which take first (or higher) order functions as
parameters. They may return another function or instead
immediately apply that function to some argument(s). When we say
"John loves Mary and conversely", this might be
analysed as "(loves AND CONVERSE(loves)) (John, Mary)".
Here CONVERSE is a unary combinator (corresponding to the passive
transform "John is loved by Mary"), and AND is a binary
combinator. Similarly, "Bob admires himself" might be
analysed as "SELF(admires) (Bob)", where SELF
transforms the binary admires predicate into a unary one. These
combinators are not restricted to predicates (functions which
yield a truth value). For example, the unary squaring function
might be defined with the binary multiplication function by SELF(*).
In Schonfinkel's and Curry's system combinators similar to these
serve to shuffle arguments of functions around, a job which in
the lambda calculus is done by variables.

**SA**: *K programmers should understand this.
We would define SELF as a function which takes a binary function
f and returns a unary function which applies f[x;x]:*

SELF:{[f]{f[x;x]}} g:SELF[*] g[3] 9

**MvT**: If a parenthesis free notation such as
postfix is used, then there are two possibilities: either use
SELF as a higher order function as before, or work on a stack,
using a stack function dup:

5 SELF(*) 5 dup *

Joy, like Forth uses the latter. Note that dup is not a second order function at all. The same two possibilities would arise for prefix notation. In the same way the effect of CONVERSE can be achieved by a stack function swap. All the functions involved now are strictly speaking unary functions from stacks to stacks. Even multiplication is unary, although one can of course continue to think mainly in terms of how many parameters a function expects on the stack.

**SA**: *Yes, I think a similar choice exists
for K. We can define 'sqr' as either SELF[*] or as the unary
function*

sqr:*. 2#

*which makes two copies of its argument, and then applies *
"dot-wise", mapping the first copy to *'s first
argument, and the second copy to *'s second argument. *

**MvT**: But there are combinators which do not
simply shuffle arguments around. One of these is the mapping
function which applies a function to all members of a list to
produce a list of results. In a fantasy notation:

MAP(SELF(*)) ([1 2 3 4]) = [1 4 9 16]

Or in postfix and in concatenative Joy notation:

[1 2 3 4] MAP(SELF(*)) [1 2 3 4] [dup *] map

Whereas MAP above is a second order function, Joy's map is just an ordinary unary first order function from stacks to stacks - it expects a list and what in Joy is called a quoted program on top of the stack, and it returns a stack with a single list on top, the list [1 4 9 16].

**SA**: *This is where Joy and K diverge. For
us, 'each' really is second-order: it takes a function and
returns a function which applies to each element of a list. We
are taught to understand*

sqr'2 3 4

*as having the structure*

(sqr')2 3 4

*Apply 'each' to 'sqr', and apply the resulting function to
2 3 4. It is even more obvious in classical APL, where functions
such as 'sqr' are not first class. You can operate on a function
only by feeding it as on operatand to one of a handful of "operators",
such as /, and then immediately applying the resulting function
to data. In K, +/ is a first class function, which can be
assigned and passed into other functions as an argument. *

**MvT**: Joy has a large number of such
combinators which are really first order functions that do the
work of second order functions because they expect one or more
quoted programs on top of the stack. Consider the conditional IF-Then-Else
combinator, where the if-part produces a truth value and the then-part
and the else-part are again functions (and not statements).

IFTE(ifpart,then-part,else-part)

In Joy there is a combinator ifte with the most common syntax

[if-part] [then-part] [else-part] ifte

which expects three quoted programs on top of the stack. But actually quoted programs which map and ifte expect on top of the stack do not have to be pushed just before the combinator - they can be the result of stack shuffling or construction. Similar to ifte is a combinator linrec for linear recursion which can eliminate the need for many recursive definitions that would otherwise clutter up the symbol table and the programmer's mind:

[if-part] [then-part] [rec1-part] [rec2-part] linrec

The implicit recursion occurs between the [rec1-part] and the [rec2-part] quoted programs. There is a similar combinator for binary (tree-) recursion that makes quicksort a one-liner:

[small] [] [uncons [>] split] [swapd cons concat] binrec

**SA**: *This seems quite powerful! Over the
years, APL developers have experimented with a variety of
recursion operators, hoping that, just as the classical adverbs
such as 'reduction', 'scan', and 'each' enabled us to factor out
explicit loops, so the new operators would help eliminate at
least some explicit recursion. For example, in K, we can define a
function 'apply_to_atoms' which takes a monadic function f and
returns a g which, applied to a list, recurses to the atoms and
applies f:*

apply_to_atoms:{[f]{:[@x;f x;_f'x]}} f:2*!: g:apply_to_atoms f A:(1 2;(3 2 4;(5 6))) A (1 2 (3 2 4 5 6)) g A ((,0 0 2) ((0 2 4 0 2 0 2 4 6) (0 2 4 6 8 0 2 4 6 8 10)))

*But it's never been clear what the "best" set of
K-recursors might be, partly because K functions can take any
number of arguments, and partly because recursion can involve
more than one function.*

*To what extent to do you think the recursive combinators
of Joy will replace explicit recursion? APLers are familiar with
the fact that it often takes time and one or two "aha's"
before a problem breaks apart into an "array" solution,
a program without loops and counters. And some problems are
stubbornly, perhaps essentially loopy. It seems to me that a
certain way of thinking needs to be acquired, analogous to "array
thinking" in APL, which would let the programmer factor out
the recursion pattern of problem such as quicksort. I'm thinking
here of the way the nested recursion combinator arose as you
contemplated Ackermann's function, which most programmers would
have said required explicit recursion. *

**MvT**: Yes, the recursion combinators are very
powerful indeed. To have just one linear recursion combinator
seems only possible in a language in which all functions have the
same arity (like in Joy, where all are unary). I do not have a
proof for this claim, except that I have not been able to define
one for Lisp or Scheme, and if it were possible it would surely
be in the literature. Maybe it can be done with clever macros.
The same is true for tail recusion as a special case of linear
recursion, and also for binary and nested recursion.

But probably it is not true that all recursion patterns can be captured by a small fixed menu of recursion combinators, especially the mutual recursion patterns. As an example, one might look at the intricate and highly specialised mutual recursion that occurs in the definition of the Lisp functions eval and apply. For those cases ordinary explicit recursive definitions seem unavoidable. On the other hand, I would estimate that at least 50% of recursions in Joy can be handled by the list combinators map, filter and fold, of the remainder at least 50% can be handled by the more general linear recursion combinator. What is left can often be handled by the binary and nested recursion combinators, and finally a small percentage does need explicit recursion.

As far as programming practice is concerned though, I have to confess that earlier on I sometimes caught myself first thinking recursively and only later using a combinator to remove the recursion. It does not happen much nowadays, the combinators do become second nature.

**SA**: *In APL we are used to thinking of
programs as special datatypes, distinct from lists, arrays, and
quotations. But in Joy, lists and programs are the same datatype.
*

[10 20 30]

*is a list with three elements, but it is also an example
of a Joy program. Can you spend a moment explaining how this
works? *

**MvT**: If you ask primary school children to do
the sum: "5 + 0 - 0 + 0" they will be perplexed and
find it hard. Indeed, some problems are difficult because they
really are so easy. In Joy, as in Lisp and some other list
processing languages, lists can be heterogeneous, the elements
can be of all sorts of types. So this is a perfectly acceptable
list:

[11 22 true London Peter Santa-Claus foo dup *]

This list might have been the result of concatenating three lists:

[11 22 true] [London Peter Santa-Claus foo] [dup *]

The first list would normally be described as containing two numbers and a truth value, because that is the ordinary meaning of the symbols. Now consider the second list - are its members a city, two persons ..? No, sorry, there is no Santa-Claus, just having a symbol or name does not create a thing to which the name refers. What about foo ? It is a standard symbol for nothing in particular. Finally in the third short list, what does it contain? two symbols or the two functions which they normally stand for? If the latter, then with an appropriate definition even foo from the other list might stand for something. So, it is not at all easy to say what a list really is. But it is straightforward to understand concatenation of lists.

Much the same with all the other operations on lists: taking its first members, deleting its first member, sticking something on the stack in front of a list, writing out a list or any of its members. But the stack is as problematic as any other list: what is the result of taking the first element of [11 22] or of [London Peter] or of [Santa-Claus foo] or of [dup *]? Taking the first of a list leaves something on the stack, but what is it? the number 11, the city London (too big!), Santa-Claus (sorry), the duplication function? I think that we have to say that strictly speaking all lists and all stacks just contain representations of something that may or may not refer to something in reality.

In Joy, as you remarked, some lists like [11 22 true] can be executed by a combinator, and this one in particular results in three things being pushed onto the stack. Another that can be executed is [dup *], and, as we say, "it expects a number on the stack and replaces it by its square". But in our pedantic mode we should rephrase that.

In describing Joy I have used the term quotation to describe all of the above, because I needed a word to describe the arguments to combinators which fulfill the same role in Joy as lambda abstractions (with variables) fulfill in the more familiar functional languages. I use the term list for those quotations whose members are what I call literals: numbers, characters, truth values, sets, strings and other quotations. All these I call literals because their occurrence in code results in them being pushed onto the stack. But I also call [London Paris] a list. So, [dup *] is a quotation but not a list.

**SA**: *It's
interesting how K and Joy partition these concepts in different
ways. In K, strings can be used to quote code: *

"2+3"

*As you would expect, it doesn't
evaluate unless you apply "eval": *

."2+3" 5

*But a list evaluates immediately, so
one can use K expressions to build lists "from within".
For example *

(!5;!3)

*builds a list whose first item is 0
1 2 3 4 and whose second item is 0 1 2. But in Joy, supposing
that we had the program *

3 enum [0 1 2]

*we couldn't use it in the following
context:*

[5 enum 3 enum]

*since this is a program. Instead, we
would construct the list out of its parts:*

5 enum 3 enum unit cons

*Of course, as you pointed out
earlier, it is this very feature which allows the Joy programmer
to construct programs out of simpler components. *

**MvT**: You want the infra combinator, I think.
It expects a list and above that a quotation. It then treats the
list as a temporary stack and executes the quotation on that.
More often than not the list that is supplied is actually an
empty list. Example:

[] [2 3 + 4 5 *] infra [5 20]

Now if enum is already implemented as you described, producing a list, and if [5 enum 3 enum] is already on top of the stack, then all you need is just [] swap infra, which will leave [[0 1 2 3 4] [0 1 2]] on the stack. Of course if you want the two sublists concatenated, then you do an extra [concat] infra first, or in this case simply flatten.

**SA**: *Programming in Joy over the last few
years, I've come to think of the operators as falling into two
classes: those which produce new information from data, such as +
and cons, and those which move items on the stack into position
where the informational operators can do their work, such as swap
and dup. Languages with variables appear to have a productivity
advantage here, since one can simply invoke data by name. Billy
Tanksley has proposed closing the gap with something he calls
"shuffle notation", where "before" and "after"
stack patterns are specified. For example, if the stack is *

.. 10 20 30

*and you want: *

.. 20 20 30 10 10

*then you could say: *

"abc-bbcaa" shuffle

*which would have the same effect as: *

rolldown [dupd] dip dup

*What's your take on this and similar efforts to deal with
the problem of "stack noise"? *

**MvT**: Indeed, conventional languages with
named formal parameters in definitions of functions can simply
use the name to pick up the value of the corresponding actual
parameter. For functions with many parameters this is indeed an
advantage. But these languages need to carry around an
environment of name-value pairs. Joy tries to avoid the latter
completely, but pays a price when there are many parameters. One
possibility would be to do what Forth does: when appropriate
allow named formal parameters. An interesting alternative is
Billy Tanksley's solution or some syntactic variant of that. It
does not introduce formal parameters to be used in the body of a
function, and also it need not be used just at the beginning of a
definition. So it deserves very careful attention.

The proposal in effect defines an infinity of possible stack shuffling operators. Currently Joy has 12 inbuilt stack shuffling operators, a few others defined in the standard library and other, deeper shufflings can be expressed using the dip combinator. Two possiblities arise: 1) replace all the current Joy mechanisms by the new shuffle operator, or 2) allow both, and leave it up to the programmer to select what fits best. To make any decision one would want to see some programs expressed in the old and in the new notations. I have not done any experiments along these lines.

One other consideration concerns efficiency. The shuffle operator (or any syntactic variant) requires a before-after after string to be examined - and that means interpreting its pattern. A simple minded implementation would be slow. An only slightly more sophisticated one would look for common patterns, and replace for example "a-aa" shuffle by dup. It would use the interpreted form only in the less common but tricky cases, like your example. On the other hand, this decision might well be left to the programmer. So, my answer is not very definite at all, sorry.

**SA**: *It might be
interesting to explore algorithms which translate shuffle
notation into a fixed vocabulary of stack-shuffling words. Are
there small bases for which efficient algorithms exist? What if
we allow the stack diagrams to express structural
transformations, e.g.: *

[ab]c -- [ac][bc]

*I imagine that Brent Kerby would
have opinions on this matter (and quite possibly an algorithm or
two!) *

**MvT**: Yes, Brent Kirby has done very
interesting work on a combinatory algebra for concatenative
languages, see his paper from the main page of Joy. For Joy
itself the following seems to be an adequate base for shuffling
the stack: the three simple operators swap dup and pop, together
with the combinator dip. (This combinator expects a quotation on
top of the stack and below that another value. It pops the two,
saving the value somewhere, executes the quotation, and then
restores the saved value on top. So, for example, [swap] dip will
interchange the second and third element on the stack.) But I do
not have a proof that these four primitives are indeed complete
in the sense required.

However, if the transformations also take apart lists on the left of the transformation pattern, or construct lists on the right of the transformation pattern, then the list operations will be needed. In fact it helps to think of your transformation [ab]c -- [ac][bc] to be composed of two transformations, firstly [ab]c -- abc and secondly abc -- [ac][bc]. Then the first has to take the list apart, and in general the list operators first and rest should be adequate for that, but uncons and unswons would also help. The second transformation has to construct two lists, and in general the list constructors [] (empty list) and cons should do, but swons might also be useful. Both kinds of component transformations would also need some of the four general primitives mentioned earlier. An actual implementation would presumably collapse the two component transformations back into your original.

I have not done any work on implementing this sort of thing for Joy. If I were to do so, I would look at the by now well understood implementations of pattern matching that are used in languages such as Prolog, Miranda and now in Haskell. An older book by Peyton-Jones describes this in detail. For a while I used to think that these transformations would re-introduce something like lambda variables into Joy, and that there is a problem of what the scope of the introduced variables would be. But I was wrong, the scope is just the right side of the transformation, so the variables cannot actually occur elsewhere in the Joy program. So if somebody wants to volunteer ...

**SA**: *I'm curious
about Joy's internals. For example, in APL and its descendants,
at least some of the bulk datatypes are implemented in
consecutive memory locations without any links, and for that
reason APL doesn't require garbage collection. I know that K uses
a reference-counting mechanism. Since the typical K application
uses a small number of large objects, this is quite efficient. So
what choices have you made for Joy, and for what reasons?*

**MvT**: Yes, the implementation of bulk
datatypes in consecutive memory locations is in many ways ideal
simply for efficiency reasons. I have often wondered what kind of
a cousin of Joy might be implemented like that without
sacrificing efficiency elsewhere. One worry of course concerns
adding a new first element to a very large array: it requires
copying the entire array. So, if this happens often, much
effiency is lost. But must it happen often? Presumably the APL
experience shows that it need not be a concern at all for a vast
number of applications.

Then there is the possibility of a mixed implementation: mostly consecutive, but some links. This may or may not require memory management in some way, either as mark-sweep, or copying, or reference counting. The last cannot handle circular structures, but that may or may not matter. At any rate there seem to be many possibilities for languages similar to APL, J and K but with a concatenative syntax to explore. You yourself, Stevan, have done some interesting work in this field with cK, a concatenative version of K. I look forward to seeing more of it, not the least because it would offer an implementation that is more efficient than the current Joy because of the way bulk datatypes are handled. At the same time, I see no reason why a more sophisticated implementation of Joy could not use the same method wherever possible.

This is perhaps the best place to mention Billy Tanksley's work, which is inspired by the consecutive memory implementation of the stack of the Forth language. Since in Joy the stack is a sequence just like lists and other quotations, there is a possibility of using consecutive memory locations wherever possible and using links only where necessary. So I think Billy's work is a step in this direction, and I welcome it.

But you also asked about the choices made in the implemention of Joy, and for their reasons. My background had been very much in logic and other symbolic processing, and the structures one needs there tend to be trees of some sort. So I have had rather little use for arrays in consecutive locations, except for implementing trees. I had used Prolog a fair bit, but Lisp I only studied in detail from a purely theoretical perspective. It was clear to me from the outset that a uniform linked implementation for Joy would be simultaneously simple and sufficiently general. Since Joy then had no assignments or other destructive updates, and hence no possibility of circular lists, memory management by reference counting was a possibility that I did consider quite carefully. In the end I decided against it, largely on the grounds that a problem would arise if I ever wanted to allow circular lists. I knew that the problem would not be insurmountable, but for a first implementation it looked like a lot of work. It so happens that Joy is still purely functional and hence has no destructive updates of any kind, so a simple reference counting memory management would still be possible. That would have the advantage of not requiring occasional hiccups for memory management and hence better suitablitity for real time work. On the other hand, it is known that reference counting is more CPU intensive, spreading the total work more continuously. Whether Joy will remain purely functional for ever (look what they did to the original Lisp), I do not know at this stage.

**SA**: *What is the current status of Joy?
What are the plans for the future development?*

**MvT** After several proof-of-concept
implementations, the current implementation, written in unadorned
C, evolved smoothly from one that was started 1995. In early 2001
John Cowan added files, floating point numbers and access to
standard C functions, and he did a lot of cleaning up. He also
added the option of not using my original garbage collector but
using the professional BDW collector, an improvement which
allowed space used for strings to be reused. This extended the
functionality of Joy tremendously, and I a very grateful for all
the work he did. Since then Nick Forde has also added a number of
features, and I thank him for that.

The current implementation is essentially an interpreter. It translates (compiles) the external ASCII form of programs into internal tree code which is then interpreted in a rather conventional manner. No attempt is made to do any optimisation, and my earlier "clever tricks" I always came to regret fairly soon because they interfered with my garbage collector. There is a growing list of general and special purpose libraries, ranging over many kinds of possible programming applications, and most of my work in the last few years has been on those. For the immediate future I plan to examine and test the so far rather underutilised module system, by writing libraries for some specialised types such as big sets, trees and dictionaries of arbitrary basetypes.

Several other people have published other more or less complete Joy interpreters, written in ML and in Scheme, in the "concatenative" mailing group. At this point in time I have no plans to write a full compiler. A first version of such a compiler would presumably use C as an intermediate language and leave the generation of machine code to the C compiler. I would very much welcome if somebody were to take up the task.