Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

David MacQueen on the History of Compilers

11/4/2025

In this interview, David MacQueen and I discuss the history of compilers, especially as it relates to the evolution of Standard ML, but also discussing changes he would like to make to the language, the influence of Peter Landin and the λ-calculus, and modern compiler technologies.

I really enjoyed this conversation and I’m thankful to David for taking the time to chat. See this post for the motivation for this interview.

This is transcript was originally AI-generated, but I’m editing it by hand as I review pieces of it. Apologies for any errors.

with uh these two voice all right well i'm recording on my side too and you're 
you got yours so i think we can just uh hop back in i'm thinking that uh maybe 
we'll do a little five to ten minute intro on some of your background how you 
got into working on ml and then you know maybe a 30 35 minute maybe a 45 minute 
total conversation and the most of it we'll just be chatting about the history 
of compilers maybe we'll talk a bit about you know the logic for computable 
functions and how that led to ml as this i can give you a bit of ml background 
sure yeah yeah so yeah so my name is dave mcqueen i was trained as a 
mathematical logician in computability theory at mit i was in the air force for 
a while then i went to edinburgh as a postdoc in 1975 and spent almost five 
years in edinburgh and that's where i turned into a computer scientist and i 
worked with people like rob burstle who was my boss and rob miller and 
basically the theory of programming languages uh my background in mathematic 
mathematical logic was uh relevant and useful anyway so um edinburgh was a kind 
of special place in the 70s in the 70s and since then in terms of european uh 
theoretical computer science which is quite different from american theoretical 
computer science in the sense instead of focusing on algorithms and complexity 
theory in the u.s uh european theoretical computer science tended to uh look at 
um connections between logic and and computer science and in particular uh uh 
the theoretical foundations of programming languages and um so the lambda 
calculus was a major inspiration source of ideas right back church around 1930 
and later ideas uh about types uh that were introduced by alonzo church and uh 
well introduced originally by uh russell and whitehead around 1911 uh as a 
defense against uh um paradoxes like russell's paradox and uh then uh turned 
into a kind of augmented version of the lambda calculus called the simply type 
lambda calculus by church in a paper in 1940 uh and uh and uh then kind of 
rediscovered and popularized by some computer scientists back in the uh 60s 
mostly uh christopher strachey uh and uh peter landon um both of whom had kind 
of uh non-standard informal kinds of education because there wasn't uh computer 
science discipline in the 1950s uh there were things going on at cambridge 
manchester uh around turing and and so on but uh and early at bletchley park 
earlier in bletchley on the code breaking uh technology mm-hmm so that was the 
kind of british context and there were people like tony hoar that were involved 
in this uh program language you know semantics and right uh really work on 
verification and applying logic to programming yeah and if if i can interject 
so you know around this time i'm i feel like today it's a little bit hard to 
overstate the influence of the development of algol had on all of these 
languages and there were you know called by name and called by value in these 
things and then a lot of landon and strachey's papers were a little more strict 
evaluation semantics is that right maybe you can speak to the development of 
algol and there was the algol group um there were uh i don't recall whether 
strachey or landon had direct connection one moment uh yeah no worries yeah 
he's getting a call from my wife i'm just hello i'm uh i'm in the midst of a 
remote discussion okay about compilers very quickly uh i'm at pedro's with sin 
we finished our shopping and we're eating lunch okay no problem right bye she's 
having lunch with her friends good to know good to glad we got that settled uh 
anyway um so i'm kind of vague about uh you know obviously um uh john bacchus 
uh peter norr from denmark um um uh the algol 68 guy uh so there was a group uh 
sure involved in in the algol development uh that was late 50s through early 
1960s and then there were some uh interesting there was probably the first book 
about compilers was the description of the algol 60 compiler mm-hmm but uh and 
strachey's work around the mid 1960s was part of it involved uh definition of a 
new kind of systems programming language she called cpl right right epl and 
that was an algol um 60 derived language uh and um uh it had i don't think 
peter landon was involved directly in cpl yeah i think he was not on the papers 
but i think around the same time he was uh busy translating he had a company it 
was a kind of software consulting company gotcha in london and he hired um uh 
peter landon and peter landon uh got very enthusiastic about the lambda 
calculus and wrote a burst of very important papers from about 64 to 66 uh one 
was uh the next 700 programming languages where he described a kind of uh 
syntactically sugar sugared lambda calculus that he called i swim if you and um 
then he wrote about uh a abstract machine for evaluating uh that language or 
the lambda calculus right uh and he also wrote a couple of papers about how to 
model the semantics of algol 60 by translating algol 60 into the lambda 
calculus with various right right devices and tricks and that machine was the 
was it landon's secd machine yeah the secd machine that's right right it had 
four components the store the stack the environment the control and the dump 
which was more stack right yeah anyway um so those papers uh had a huge impact 
and uh later on uh robin milner was interested in writing a uh uh proof 
assistant this would be an interactive system to help uh a person uh perform 
formal proofs in a in a particular logic which was called lcf or logic for 
computable functions and this is at stanford right he did that work initially 
at stanford and then came brought it back to edinburgh right right yeah around 
1973-74 and logic for computable functions was a logic for reasoning about 
functions and recursion and so on developed by um dana scott in the late um 
late 60s mid to late 60s and he developed that um because he was trying to find 
a mathematical model for the lambda calculus uh and he got frustrated he 
couldn't find a model because of the uh fundamental problem of self-application 
that the model of the lambda calculus had their support the idea of buying a 
function to itself and for a long time scott couldn't figure out how to build 
such a model and he decided to fall back on on a logic for reasoning about uh 
these kinds of functions but then he did discover a model around 1969 which is 
the scott domain model infinity model uh and uh so he never published that the 
earlier paper that described the lcf logic uh kooch um i swim and all that was 
the name title of the logic it it's since been published in uh logic and 
symbolic computation gotcha right uh as uh sort of uh retrospective uh 
publication and so now ml enters the picture because this lcf system yeah yeah 
uh robin milner had a a robin milner was another sort of self-taught british 
computer scientist he um had a good education uh good mathematical background 
and he was teaching at a public school you know private secondary school in 
britain and um he got interested in doing research and he um i think he had a 
position at swansea and he managed to get a kind of a research visit for a year 
or two at stanford around 1971 72 and at stanford he uh met with a guy named uh 
uh wyrock uh wyrock uh wyrock and uh they together uh thought they could write 
a theorem prover that would use this logic and so they the first first 
generation of that was uh it was called ml no it was called uh lcf theorem 
prover uh what didn't exist at that time that was at stanford they wrote the 
system in lisp and lisp and lisp was the meta language right right pro language 
if you like in the language for writing uh proof tactics but um he then got a 
position where miller then got a position to edinburgh and came to edinburgh 
sometime in late 73 mid i think and um brought in a couple postdocs over the 
next years got some funding and uh decided to do a proper lcf theorem prover 
and instead of using lisp which was uh insecure in the sense that it was not 
statically typed um he chose to use i swim as the meta language but added a 
type system and a static type checker and beyond that he um kind of 
rediscovered um the idea of inferring types and this was a a fairly old idea in 
the logic world but it wasn't known in computer science so no they essentially 
rediscovered it and the um ml metal language for the lcf theorem prover and 
that was in the mid to late 70s they kind of finished this theorem prover 
around 1978 and they published a uh a paper um on the theorem prover and then 
miller published a paper on the type system and so the the purpose of you know 
they had lisp and ice women landed his original papers dynamically typed right 
but the the issue was was a kind of thought experiment language uh although i 
swim uh after he left um working for strachey he went to new york and he worked 
at univac for a while but he also had a visiting position for a while at mit 
and so he taught people at mit about ice swim and there was actually an 
implementation effort led by evans and wolzencraft at mit in the late 60s and 
they called their implementation pal i forget what pal stands for but it was 
basically ice swim and so uh but i don't think uh the pal uh work had any 
visibility or influence in edinburgh um but the reason that lisp didn't work 
very well or even dynamic ice swim was because if you you know made some 
mistake in your list code you could get a a proof that was wrong right right so 
the point uh static type checking was that there would be an abstract type of 
theorem and the only way you could produce a value of type theorem um is to uh 
use inference rules that were strongly typed and uh the strong typing uh 
guaranteed in principle the correctness of the proof so you would be able to 
generate any unprovable or non-proved theorems other than by uh application of 
the proof rules the inference rule uh and so it was logical security that the 
type system was providing but in order to have some language that was um closer 
to lisp in in flexibility uh they'll never have the idea of introducing this uh 
automatic uh type inference system and as i said uh earliest uh work on 
something like that was uh 1934 by by um curry uh and then later on curry in 
the 50s when he was writing his book about uh um about uh combinator logic he 
had a chapter in that book about uh what he called functionalities for for 
conduct combinators which were essentially his term for types uh uh in the 1934 
paper he kind of then formally um described um described something that you 
would call type inference and then came back to that uh later on in in the 60s 
and then roger hindley a logician at uh swansea uh took that up and wrote a 
paper about it as well and this was before um milner knew about it so later on 
i mean had published his paper about it somebody had told him that there was 
this earlier work by curry and hindley but in fact there was the the first 
actually um published algorithm for type inference was uh produced by max 
newman in 1943 and max newman was turing's uh mentor at cambridge with 
professor of mathematics and uh uh uh was interested in things like goodless 
proof and foundational issues and uh he was the one who sent turing off to 
princeton to work with perch and get a phd and um later on he became the head 
of the manchester uh computing laboratory and hired turing there toward the end 
of his life anyway uh so max newman in a different context had thought about 
type inference and wrote a paper about it and he was his main target was type 
inference for coin's new foundation which was a kind of uh alternate 
formalization for sort of logic but he also had a section that did uh applied 
the ideas to lambda calculus anyway so that uh early work by newman was 
interesting but uh nobody knew about it and henley rediscovered it back many 
years later um anyway so this is a case where there was parallel work in logic 
and computer scientists but they weren't aware of each other well and at this 
point the lambda calculus and combinatory logic are sort of cousins right 
solving a lot of the same problems right they're equivalent right combinatorial 
logic is essentially the lambda calculus without variables and you have a bunch 
of uh combinator reproduction rules instead of the beta reduction that would 
calculate right right so um move my phone a little bit closer to make sure 
anyway um so that was the kind of background cultural um situation in edinburgh 
in the late 70s and i've arrived at 1975 um but i was working at that time 
there was a computer science department and rob milner and his group of 
postdocs and grad students were in the computer science department and they 
were at the king's buildings campus which was a kind of science campus three 
miles south of the rest of the university which was in the center of town uh 
and um there was a school of artificial intelligence uh so the early blossoming 
of artificial intelligence around edinburgh and i worked for ron burstle who 
was in the school of artificial intelligence and uh some of some of his other 
postdocs and students and uh um um we developed a kind of parallel language 
called hope which was a um language derived from or evolved from a rather 
simple first order uh functional language that kind of resembled girdle's uh 
generally recursive equations of uh 1930 uh which were part of the uh leading 
to the incompleteness still anyway um we developed this language called hope 
and we borrowed some ideas from ml in particular the general type system and 
the type inference and type checking uh approach and uh it had its own um 
special thing which was later called algebraic data types and uh so algebraic 
data types came came from hope um based on part of the evolution from this 
first order language and then um in the 80s uh there was a another effort to 
design uh a kind of first class uh functional programming language which uh 
robin initially called standard ml because it was trying to standardize and 
unify several um threads of work including the ml that was part of the lcf's 
theorem prover and the hope language that ron bristol and i had developed and 
um cambridge was off the people at cambridge and inria were off uh kind of 
evolving lcf and in in in the ml dialect there so uh so robin proposed this 
sort of unified standard ml that had uh aspects of hope and uh and lcf ml and 
through various meetings of mostly edinburgh people or people in the edinburgh 
culture like myself i was then bill labs um we designed the standard ml 
language from about 1983 to 86 and then robin's idea robin viewed it just as a 
kind of language design effort and a major goal was to produce a formal 
definition of the formal definition of the language using uh tools that had 
been developed at edinburgh for operational semantics so there was a kind of 
edinburgh style operational semantics and that was used to give a formal 
definition of the ml uh language standard ml language and that was published as 
a book by mit press 1990 um and um but raman's goal was essentially to produce 
produce that language design and the corresponding formal definition published 
as a book and uh as an actual programming language he had no intention of using 
it and he wasn't terribly interested in implementing it uh he had a kind of 
naive idea that uh uh once it was out there some commercial software house 
would say we should write write a compiler for this thing uh of course that 
didn't that eventually kind of happened there was a software house that uh 
harlequin limited uh later on in the 80s that uh had a project to implement a 
ml compiler because their boss at that time was interested in producing a lisp 
compiler a ml compiler and a dylan compiler where dylan was this uh language 
developed at apple that uh nobody remembers kind of a lisp dialect uh 
interesting anyway uh but uh uh uh i was at bell labs and i uh got together 
with the new young faculty member andrew appell at princeton who was interested 
in and writing compilers and uh so we got together and uh started working in 
spring of 1986 was working on a new compiler for standard ml and uh there were 
other kind of derivative compilers uh there was something called edinburgh 
standard ml which was kind of derived from an earlier edinburgh ml compiler 
which was derived from a uh compile that luca cardelli wrote when he was a 
graduate student um or dialect so the uh and then andrew was a real computer 
scientist with a real computer science degree from carnegie mellon and he knew 
about things like code generation runtime systems and garbage collection and so 
on um and uh you know before when we developed hope for instance and when uh 
the lcf ml was developed it was just piggybacking on in the lcf's case it was 
piggybacking on a lisp implementation and so the kind of nitty-gritty stuff was 
handled cogeneration garbage collection and so on were handled by the 
underlying lisp and uh the hope implementation that i worked on was uh hosted 
by a pop two which was uh an edinburgh developed uh cousin of lisp uh symbolic 
programming language um rather influenced by by landon landonism maybe you can 
speak to it i find it so interesting that you were at bell labs and appell at 
princeton and in a roughly similar time period um i think you also had aho at 
bell labs and ulman at princeton so we took very different approaches to 
compilers yeah after edinburgh i went to um well i i tried to get back to 
california so i had a kind of aborted so one year stay at isi in southern 
california and i went to bell labs and um uh there a guy named robbie seti who 
was one of the authors co-authors of the later edition of the dragon book on 
compilers um he was interested he had become interested in semantics of 
programming languages and so uh they hired me at bell labs and um um um aho was 
there he was a department head of the kind of theory department in the center 
this better way unix was developed and so there there was the hard hardcore 
unix group that uh richie and thompson and kernahan and pike and later on and 
uh and so on uh doug mcelroy who's my boss uh and uh uh that was the orthodox 
culture at bell labs of course by the time i got there in the beginning of 81 
uh unix was already a big success and well established and and so on and the 
unix people basically all worked in the communal terminal room up on the fifth 
floor um next to the pdp 70 right pdp 70 right right time sharing system and uh 
i was never part of that group and two doubt two doors down the hall there was 
this new guy um who joined bell labs a bit before i did probably 1980 or 79 
named bjarne stustrup and he was working at trying to make c into an 
object-oriented language because he had uh done his graduate work using simula 
simula 67 at cambridge right right and simula simula simula was originally also 
an extension of algol 60 but for parallel process simula was explicitly a 
derivative of algol 60 right the idea the class was just uh uh almost like cut 
and paste combination of blocks of algol code uh with weird control structures 
um anyway uh so there was that object-oriented stuff going on down the hall but 
uh uh we were on different bandwidths so there was better communication and uh 
um also not very much communication between me and and the sort of unix 
orthodox culture i you know used the unix system and uh one of the first things 
i i did when i first arrived at bell labs to like make my computing world a 
little bit more comfortable is first i ported a version of emacs to the uh 
which was alien culture and to the local unix system and i also wrote a version 
of yak in list so i had to figure out steve johnson's weird code for the yak um 
anyway but pretty soon after that i got involved in well first uh working on 
further development of hope and then starting in 83 and started working on 
standard ml why do you think there was so little mutual interest between what 
you were working on and the unix folks well it's kind of background and culture 
the unix approach was very seat-in-the-pants kind of hacking uh approach and uh 
they didn't think very formally about what they were doing basic express things 
at two levels one was vague pictures on a on a blackboard whiteboard where you 
had boxes and arrows between boxes and things like that and code so they they 
would translate between these sort of vague diagrams and uh and uh assembly or 
later c code and c is essentially a very high level pdp 11 assembly language if 
as far as they're concerned um and uh you know i had a more mathematical way of 
trying to understand what was going on and uh uh so it was not easy to 
communicate um dennis ritchie originally had a mathematical background but uh 
and as one point or another i tried to discuss concepts uh like the stream 
concept in in unix with him um but uh yeah one main reason i was at bell labs 
is because doug mcelroy who hired me had done some work relating to um stream 
like co-routining and i had worked on that at edinburgh with with gil khan um 
so that my first work in computer science so that involved functional 
programming and uh lazy evaluation a particular way of uh organizing that and 
it kind of connected conceptually with the notion of the stream in in in unix 
you know piping pipelines uh of essentially stream processing where you know 
the input and output were pipe streams um anyway um but uh srushe also didn't 
connect with this orthodox culture very strongly because uh he as i said wanted 
to turn c into an object-oriented language and he did that initially by writing 
a bunch of pre-processor macros to define a class-like mechanism and that 
evolved into c plus plus and much later c plus plus compilers it was because 
initially kind of pre-processor based he had he started with the um cpp unix uh 
pre-processor and evolved that into a more powerful pre-processing system was 
called c front right right and at some stage it was called c with classes um 
and later on c plus plus and uh and and it's interesting though now nowadays i 
mean i think i think i think it's herb sutter that's working on uh something 
called cpp front which is the the c front for c plus plus to to turn modern c 
plus plus into yeah very interesting uh yeah i'm not very keen on macro 
processing uh front ends especially if you if you're interested in types 
they're very problematical right if you want to define a language that has a uh 
well-defined uh provably sound type system macros are a huge mess right right 
well maybe we can return to ml so we've got some sort of standard ml but at 
inria there's the beginnings of what would be camel and then ocaml right so at 
the beginning so the first meeting was a kind of accidental uh uh gathering of 
people at edinburgh in april of 83 and robin wanted to present his uh proposed 
uh standard ml that he worked on a bit earlier and it turned out that uh one of 
the groups at inria that would have close connection with robin and uh research 
work at edinburgh was uh gerard wetz group at uh inria and as a member of that 
group there was a guy named guy kusino who worked on uh um was part of wetz 
group at uh inria and he happened to be in edinburgh and kind of was involved 
in these meetings as a kind of representative of the inria people and um he 
took the ideas made notes and took the ideas back to edinburgh and they they 
contributed a a critique or a you know feedback uh about the first uh iteration 
of the design uh so in fact uh the inria people were connected to the 
beginnings of standard ml through that connection and geek kusino and gerard 
wetz um in the next next following year we had a second major design meeting 
and um i believe kusino came to that one as well there were several people from 
edinburgh and people who just happened to be around i was around um luca 
cardelli was around of course rod burstall and robin milner and some postdocs 
and grad students and so on uh and uh we made a decision i swim had two 
alternative syntaxes for local declarations essentially a block structure one 
was let declaration in expression where the in expression defined the scope of 
the declarations and there was an alternate one that was used in in in i swim 
which was expression where declaration so you have the declaration coming after 
the expression and uh when they scale up for fairly obvious software 
engineering reasons uh it's better to have the declarations before the uh scope 
rather than after the scope because the scope might be uh uh you know you might 
have a 10 page expression followed by the declarations on page 11. right right 
it's kind of hard to understand the code and you have a kind of tug of war if 
you have uh let declaration in expression where declaration to you know prime 
or something which declaration applies first uh anyway so it seemed uh 
unnecessary to deal with these difficulties with the wear form and so we 
decided to drop the wear form and only have the let form in ml and uh this 
upset the inria people um they thought that where was wonderful and they wanted 
to keep wear so they decided you can't have let the anglo-saxons uh have 
control of this thing so they they did their french uh branch or fork of the 
design but they kept wear for a while and eventually dropped it anyway so 
that's the original connection and unfortunately they also made their own you 
know satisfied their own taste in terms of certain concrete syntaxes where you 
just said the same thing but the concrete syntax was slightly different in 
their dialect just making it um unnecessarily difficult to translate code back 
and forth um and in this time shortly after some of the first standardization 
efforts there was significant work done on the representation of modules is 
that right yeah so that was my contribution um i had been thinking about a 
module system for hope and i wrote a preliminary paper about that and it was 
kind of inspired by uh work in kind of modularizing formal specifications in 
software and there was kind of a school of algebraic specifications and ron 
burstle was connected with that joint work with joseph gogan um who was at ibm 
and later ucla um anyway so uh i was inspired by that party and also by work in 
uh um type systems as logics uh curry howard i've heard of the curry howard 
isomorphism things like that in the work of uh per martin luft the swedish 
logician um and so there were these sort of type theoretic logics and so part 
of the inspiration for module ideas came from that and part of it came from the 
parallel work on modularizing algebraic specifications and uh so i wrote up a 
when the first description of the first description of ml was published as a 
paper in the 1984 list conference and i had a another paper in parallel 
describing the module system and then over the next couple years we further 
refined that and um it became a major feature of standard ml and um i'm still 
thinking about that i would love to get that i would love to get to that once 
we cover some of the other history in between there and some of the changes 
you'd like to make because i think there's a lot of anyway implementation of 
standard ml okay so uh obviously the lcf implementation was a compiler but it 
compiled uh the ml code in the lcf system into lisp and then invoked the lisp 
evaluator to to do execution um so code generation was basically a translation 
to lisp and uh similarly that's the way i did the hope implementation by 
translating to pop two and then evaluating in pop two which i believe is still 
how to some extent the ocaml compiler today works i believe there's an option f 
lambda that gives you these big lisp expressions uh yeah well it's easy to to 
write an interpreter and right uh what happened with ocaml i won't try to 
represent them sure because i wasn't involved in that but uh originally the 
reason for the name camel c-a-m-l was that uh one of the researchers at inria 
whose name just in you know floating in my head but i can't pull it up uh he 
had been studying category category theoretic models of lambda calculus uh you 
know there's a certain adjoint situation that can be used to define what a 
function is and what function application is and out of that adjoint situation 
in category theory you can derive some equations and then you can interpret 
those equations as a kind of abstract machine and so he called that the 
categorical abstract machine language or machine and then camel was introduced 
as the categorical abstract machine language so there's no connection to ml as 
in meta language it's a right so that ml how funny yeah there is a connection i 
believe uh but uh another way of explaining the name is categorical abstract 
machine language interesting but i think they they like that because it had 
both ml and this other mathematical uh interpretation right right and so they 
actually you know implemented this uh abstract machine uh derived from these 
categorical equations but it was from an engineering point of view it didn't 
work very well wasn't performing as well as they would like and so they fell 
back at inria they had their own list uh that had been developed at lisp uh by 
inria researchers it's called lulisp uh and even spun off into a company but 
the lulisp people had uh developed infrastructure such as a runtime system with 
garbage collection and a uh kind of low-level abstract machine uh into which 
they compiled lisp and uh so after the the categorical abstract machine turned 
out to be not so good from an engineering standpoint they adopted the lisp 
infrastructure and uh translated the camel surface language into the uh lisp 
abstract machine and ran it on the lulisp operating system uh lulisp runtime 
system mm-hmm and um that was much better and um that was much better and um 
was considered a serious thing because it was commercial being commercialized 
mm-hmm better do with things like france lisp and and things like that at the 
time various uh interlisp uh france lisp uh lulisp and um um uh but then a guy 
named xavi lewar arrived as a young fresh researcher and he was a kind of 
wizard programmer and he uh implemented a uh his own implementation of camel 
which he called camel special light and because it was much more lightweight 
and uh accessible if you like and it beat the pants off of the lulisp 
implementation in terms of performance and so he became the kind of lead 
developer as camel evolved and he worked at did some research at the language 
level and also uh heroic work on the on the implementation and that's the 
original origin and then later on a a phd student of dda ramey at inria uh as a 
phd project developed an object system on top of camel and they thought that 
was uh a neat opportunity for marketing because of the great popularity of 
object-oriented programming and so it became whole camel but it was a good 
experiment in the sense that the old camel programming community had an option 
of using the intrinsic uh functional facilities of old camel as a essentially 
an ml dialect or they could use the object-oriented extension and it turned out 
nobody used the object-oriented extension and whether that's because it wasn't 
a really good object-oriented extension or because the functional model beats 
the object-oriented model which is my preferred explanation and so what did you 
you were working on an sml implementation at the time what did that compiler 
look like in right so we started uh and rappel arrived at princeton in early 
1986 and we got together fairly quickly after that and decided to build a 
compiler and um uh i knew the type system i had implemented uh similar stuff 
for the hope language and uh so i knew how to write the type checker and type 
inference and things of that sort and i had my ideas about the module system 
and how to implement that so i i took the front end where type checking type 
inference and module stuff was uh taken care of and uh andrew uh took the back 
end and the interface between those was essentially just lambda calculus there 
was a intermediate language produced by the front end which was untyped lambda 
calculus and then andrew wrote a optimization phase and uh code generation and 
built a runtime system with a a fairly straightforward copy garbage collector 
and the runtime system was written in c but one of the objectives uh from the 
beginning was that we had some a point to prove which was that ml was a good 
language for writing a compiler because a compiler is basically a symbolic 
processing process mm-hmm so a language that's good at symbolic processing like 
ml should be a good language for writing a compiler and so everything was 
written in quite quickly students at princeton produce a ml version of lex and 
an ml version of yak black and i i wrote the first lexers and parsers just you 
know by hand um but uh fairly soon we replaced those by generated lexers and 
parsers tools and um um so those tools probably came along in about 1987 uh and 
uh andrew had a number of students uh and we had some joint um kind of postdoc 
workers that were employed as summer interns at bell labs and or visitors at 
bell labs who worked on uh you know fairly hardcore compiler issues and of 
course uh there are interesting different problems for a compilation in a 
lambda calculus based language with true first class functions in comparison 
with fortran pascal c etc that's really great right stuff and uh there were 
also technical problems introduced by object-oriented programming like if you 
wanted to be able to be able to c plus plus compiler or something of that sort 
which uh we're not unconnected with functional programming because 
object-oriented programming is just one way of bolting on higher order features 
to a imperative language and objects and classes and so on are basically higher 
order things as they're uh elaborations of uh function closures right right 
anyway uh so uh andrew and his his students wrote a whole series of important 
papers about uh techniques for optimizing uh functional uh programs and of 
course there were other communities there was the haskell community that came 
along in the late 80s um and they were interested in the more doctrine so ml is 
a functional language in that functions of first class um which is as far as 
i'm concerned the main idea the big idea then there's uh if you take the lambda 
calculus or combinatorial logic seriously there's the issue of order of 
evaluation or alcohol 60 if you like mm-hmm and alcohol 60 and um so the the 
phenomena there is that you're compiling into a language that supports uh say 
lazy evaluation which is essentially called by name plus memorization right 
right and that was a hot topic in the mid 1970s there was a lot going on there 
was uh uh uh friedman and wise and uh ashcroft and wage and uh henderson and 
morris and there was a bunch of papers um occurring around 1975. and it's 
remarkable to the degree to which strict call by value semantics have just 
essentially worn out in nearly every major program well that's because um lazy 
evaluation is a nice piece of magical machinery um that you can do some really 
neat stuff with right the the applicability where they are big wins is rather 
narrow and uh so if you want to do you know series larger systems programming 
and you have lazy value operation you have to turn it off right right around it 
to get the kind of efficient so it's uh it's one of my principles in terms of 
language design is that magical things turn out to be things to avoid you don't 
want too much magic in your language now ml has the magic of automatic type 
inference and i think overall that's a clear win but from the experience of the 
haskell community where they introduce compiler pragmas to turn off laser 
evaluation and that sort of thing um so the idea is there's a trade-off they 
said we'll have lazy evaluation which we love and you can do all these 
interesting tricky things uh which i did earlier back in edinburgh working on 
the stream processing stuff so some of the big examples of cute things you can 
do with lazy evaluation geocon and i developed in our work at edinburgh but in 
return you you need to have purity as one of the trade-offs ml has impurity it 
has side effects it has variables or reference values that can be updated and 
mutated and arrays in standard ml and uh we don't have a very theoretically 
sophisticated way of explaining that it's just part of the semantics of the 
language allows you to do these side effects and in real life ml programming 
you tend to try to avoid those right programming a you know 99 percent 
functional style but in uh haskell that tentative purity is uh which is 
necessary it's going to have lazy evaluation because lazy evaluation makes it 
hard to tell when things are going to happen things that happen change the 
state it's more important to know when they're going to happen right right 
anyway uh so they they have lazy evaluation and they have purity except that uh 
in the real world you need some kind of state and so they introduce monads with 
monad various monads and they have them it's set so a haskell program is in 
some sense uh describing how to construct a monad it's sort of like can you 
know about continuation passing style right right but there the the programming 
model is that the what a program does is construct a continuation in fact the 
continuation representing the whole program and in um haskell you construct a 
monad representing the action of the whole program and then at the end you fire 
off that monad and uh compute the new state and that firing off the monad is a 
piece of extra magic right unsafe uh what is it called unsafe something other 
there's a magic operation for firing off the monad that you built right right 
anyway so i as you can tell i am happy with the impure slightly theoretically 
embarrassing situation in ml well and so the impurity is represented really 
just by those two type operators there's a reference type operator if you avoid 
the ref the ref type and the array type you're in a pure functional language 
exactly exactly and so other than modules after some of the initial sml 
standardization efforts there haven't been too many changes right in between 
them right so uh as i said we started in 86 and by summer of 87 we actually had 
a system that bootstrapped itself just barely made it to the bootstrapping 
milestone because we were using a kind of flaky prototype implementation called 
edinburgh standard ml which was this earlier system that had been modified to 
incorporate a certain number of ml features in early implementation in the 
module system etc and it was not robust but it got it fortunately it got us to 
the point where we could bootstrap our own compiler and that happened in in the 
spring of 87 and from then on we just had uh continuing work on these uh 
optimization techniques and uh i went through about three or four successive 
implementations of the module system and i'm contemplating a fifth right now uh 
but uh and then uh so andrew and his students published a bunch of papers we 
published a couple of papers describing the system as a whole and andrew and 
his students published a string of papers having to do with things like closure 
conversion how to represent closures register allocation things of that's right 
um specialized to the context of ml and uh uh in the mid to late late 90s and 
into the 2000s uh two main things happened one was um uh the flint group at 
with zong xiao at yale do you know do you know this group so they zong xiao was 
a phd student of andrews and he went and got a job at yale and he built built 
up a research group and they were collaborators with on standard ml of new 
jersey and their big project was to so as i mentioned earlier the interface 
between the front end and the back end and the back end originally was just a 
really straightforward untyped lambda calculus because there's this uh i guess 
it's the curry there's church and curry you can look up church and curry and 
they're different attitudes to what what the types role is in the language 
whether say type lambda calculus you do the type checking and then you throw 
away the types and you just execute the lambda calculus that's essentially the 
i think that's the curry's approach as opposed to the church approach which is 
the types are really there so the initial approach was the do the type checking 
throw away the types compile the lambda calculus and uh the flint group wanted 
to show that there was uh benefit from keeping the types and trying to exploit 
the types for various optimizations yeah mainly telling giving you more 
information about the representation of things machine representation and 
whether arguments are going to be passed in registers or on the stack or 
somehow uh as part of the continuation closure or something and uh so they 
pursued in a in this flint project at yale the idea of uh and because they we 
had modules which are kind of higher order types involve higher order types 
they used a fancy higher order version of the type lambda calculus called f 
omega i didn't heard of f omega but it's it's kind of the canonical higher 
order type lambda calculus where you have not only types but type functions 
type operators as uh as parameters and uh so you needed this higher order stuff 
in order to model uh length the uh module system which was kind of a module 
level lambda calculus uh a functional language and uh so on the static side you 
had to deal with uh higher order type constructs so they they developed their f 
omega based intermediate representation replacing the plain untyped lambda 
calculus that was used before confusingly using the same name for it p lambda 
uh and uh and uh they developed various optimizations and and a full kind of 
middle end with a series of uh uh optimizations uh uh some of which were fairly 
conventional some of which were were um special to this in trying to exploit 
the types and uh uh uh uh uh made somewhat more complicated by the way they 
implemented the this higher order type system this f omega like type system um 
and uh you know this is started in the early 90s and so you had the the kind of 
cycles and memory available for the technology of that era and they found that 
uh uh uh the compiler optimization computations they need to perform were 
expensive and so they made things more complicated in order to optimize uh for 
the technology and that's another theme you you find if you have a system that 
lasts long enough and of course this is in the middle of the uh more's law 
exponential race so um decisions that are made in the 1990s uh to save space or 
or time may not be the best decisions 30 years later yeah so that's one of the 
things i'm thinking about working on so to lead in i would love to maybe shift 
to that right now so i when i think about something like the module system in 
sml i'm also thinking about um other forms of you know higher order computing 
in c plus plus for example the template system is sort of like this pure 
functional language on top of the base language that's sort of different and 
separate may i have no idea because i don't talk to even when we were two doors 
down the hall uh but uh it seems to me that there there may have been some 
inspiration from the module system that led to the template system in c plus 
plus and so what do you think about having this higher ordered part of the 
language be such a fundamentally separate part of the language as opposed to 
you know well i view it as a fairly integral part myself and uh there is a 
choice in fact different implementations of standard ml uh take different 
approaches uh if you view the module system as simple simply first order that 
is not a full functional language of modules but just the first order language 
of models you can get away with treating it kind of like c plus plus templates 
or essentially an elaborate macro system you can essentially eliminate that 
stuff and then compile what you get after eliminating that and um the milton 
compiler which is an alternative implementation of standard ml takes that 
approach so they get rid of polymorphism they get rid of the module system and 
then they have a simpler type system to work it with and and it also requires 
them to do whole whole whole program uh compilation uh so if you're willing to 
take the hit of uh of doing whole program compilation and you restrict your 
view of the module system as being first order you can get away with treating 
the module system as a kind of macro level uh part of the language and uh uh uh 
so you can generate a language which is just sort of tight and functional sure 
tight um and um and with whole program analysis there's other things uh 
optimizations that become available uh so milton kind of went all out toward uh 
uh gaining performance optimal performance uh and they win some of the times uh 
not all the time uh and uh but uh i was interested in uh pursuing the model of 
uh the module system as a functional language including higher order functions 
so uh i spent a lot of time thinking and and working on uh first class functors 
or uh you have you have structures which are the basic modules which are just 
encapsulated pieces of environment if you like but both static and dynamic and 
uh then you have these functors which are functions over those those things and 
then you can the higher order things would be functors over functors or 
functors producing functors etc now my model handles that fine but uh it 
removes the or at least makes things much much more difficult if you wanted to 
pre-process away the module system and so what is that what do you what would 
you what would you like to change you've got we've got the gap between the 
module system as it stands today and sml and then where you would like to take 
things in uh uh nothing really drastic there are a couple features that i think 
were mistakes for instance we have a special declaration if you have a module 
or structure s you can declare open s and essentially what that does is dumps 
all the bindings inside the module s into the current environment environment 
and you can say open a b c d e f g you know a whole bunch of things and we used 
to do that a lot in the early days but it turns out to be bad software 
engineering in a concrete way in that you have a kind of polluted namespace and 
given the the name food you don't know which module it came from if you have a 
whole bunch of at the beginning of your your module you open a whole bunch of 
the stuff so anyway uh so i'd like to just get rid of this open declaration and 
get rid of it and we've come to over the years essentially eliminated from our 
own code generally by just introducing a one character abbreviation for some 
module that we would have opened in so it can say s dot foo instead of simply 
foo right do you think that hurts the extensibility at all like if you were to 
say have i don't know my list as a part of your code base that overrides the 
default list where you could open the list module and then you know override 
some things in there but yeah the problem is that if you open a module you may 
be accidentally overriding things that you didn't mean to override because the 
module you're opening might have 30 components and now you've possibly 
overridden anything whose name clashes with those 30 components so it improved 
it's a it's a hygiene thing better hygiene namespace hygiene if you like and 
there's other minor things like because of an edinburgh local practice in pop 
two it was possible to declare new infix operators in with a given binding 
power priority pressure precedence and this is a bad idea just in general in 
language of science so i i want to get rid of that um and um uh you know a few 
cleanups like that of features that turned out over the over the year and there 
that we had a a kind of uh partial and reluctant revision of the language in 
1997 so the original book was published at mit press by 1990 and then in 1997 
uh four of the principles rob milner and his student uh and bob harper from cmu 
and myself happen to be in cambridge for a a research program at the newton 
institute and so we started meeting and and discussing how how to fix various 
problems that we've discovered in standard ml since 1990 uh except that there 
was sort of kind of uh the radicals and the conservatives and conservatives 
kind of won out and um um so for instance um equality is a uh a perennial issue 
and what how do you treat equality uh you know and uh obviously quality of 
integers is pretty straightforward so it's kind of per types so equality of 
lists you know the lists may be to a billion element long so if you're 
comparing two lists that are a billion elements long it's going to take a long 
time so it's kind of not not a bounded fixed expense involves recursion if 
there's recursive types involved and so uh the mistake we made originally was 
extending polymorphism to handle equality so we introduced a special kind of 
polymorphic type variable called an equality variable which had tick tick a as 
its notation double tick at the beginning hyphen and uh these variables could 
only be instantiated by types uh that supported equality and so the notion of a 
type that supports equality is a kind of generic notion uh at least 
conservatively speaking because um certain primitive types like int has an 
equality string has an equality which compares character by character on the 
string and uh data types like list have an equality even though it's uh you 
have to generate a recursive function a particular type so this kind of generic 
uh notion of equality and so the polymorphic variables would range only over 
types that supported this generic equality and therefore you could apply 
equality to the values of the types even though the types were unknown and so 
so to implement such an idea there are two approaches uh the approach one 
approach is that implicitly anytime you parameterize over an equality type uh 
you silently pass an equality operation in as well and that leads to the notion 
of the notion of classes in haskell what they call classes technology i believe 
was because they were trying to borrow on the hype of object-oriented 
programming with classes right on the hype right anyway uh so uh so i don't 
like type classes in haskell and this is an example of a type class if you 
implement it that way it turns out that there's a tricky implementation that 
doesn't involve passing implicit uh equality operators which is that uh if a 
type is one of these generically you know one of these types that supports 
generic quality then we have enough it just so happens that we have enough 
information for the purpose of garbage collection in our runtime 
representations so you can interpret the runtime representations to do equality 
that's how we implement it we both we what uh one of andrew's graduate students 
did the type class style implementation just to show that it could be done but 
uh anyway so that's a kind of hack and it's kind of uh it works uh without 
having an actual type class like mechanism with implicit uh operators being 
passed um but it works because of a uh accidental that's not quite hello hi not 
quite okay right okay bye bye we can close it we can close up here if you want 
on uh i see we're getting it you know about our yeah so in terms of uh yeah let 
me try to summarize so there's all kinds of details i could go into sure the ml 
story and uh um the big thing is that in the functional programming community 
uh and some of this uh migrated a little bit to the object-oriented pro 
community because there are some common commonalities uh dealing with higher 
order stuff and dealing with types uh too but um um if you look at you know 
pascal c uh etc then um essentially you know python if you want to write python 
compiler or whatever this uh well lisp has its own peculiarities because of 
dynamic binding sure and so on and so on and dynamic typing as well um so 
there's type you know there's there's javascript and there's typescript uh 
anyway um but because of the computational model that based on the lambda 
calculus and the presence of higher order functions and uh also opportunities 
presented by static typing um there's some interesting um there's some 
interesting compilation programs problems that uh come up in that context that 
are different from the mainstream you know high performance computing and 
fortran or something or even c compilers or go compilers or what have you uh 
and um um so there's been an interesting interaction and of course uh there's a 
different kind of involvement of theory in these functional programming 
languages and particularly related to their type systems uh but also 
historically to the uh order of evaluation issue and that sort of thing uh so 
there's a kind of different line of things that i have never gone to a pldi 
conference i've been working on a compiler for many years but uh i i submitted 
one paper to pldi but it wasn't admitted it wasn't accepted because um the 
program committee had no idea what i was talking about probably um this was a 
when you have these algebraic data types uh and you do pattern matching over 
algebraic data types to do decomposition and dispatch at the same time uh that 
process can be automated so there's a notion of a match compiler pattern match 
compiler why is the code doing pattern matching and most many functions in in 
language like ml or haskell involve uh some kind of uh non-trivial pattern 
match and so writing a match compiler is kind of part of implementing a a 
functional language with algebraic data types and so i back in around 1980 81 i 
was thinking about that and i wrote a paper later with a summer intern and 
submitted to pldi but they had no idea what a match compiler would be useful 
didn't relate to you know conventional compilation technology yeah i think it 
reminds me of some of what they have in uh lvm for some peepholes when doing 
instruction selection and some of these small transforms that you know you 
could sort of describe transforms in a higher level description language which 
is interesting yeah yeah we have a current issue in standard ml because i have 
one continuing collaborator john reppi at university of chicago formerly bell 
labs and um um he recently you know facing you know we we did a lot of 
architectures back in the 90s mainly so we had a kind of architecture 
independent code generation framework called ml risk and we had a bunch of you 
know we uh spark uh mips um alpha uh ibm power blah blah blah so we had a whole 
bunch of and eventually x86 as well um but um we had this technology for 
writing uh code generators for risk architectures which uh was kind of uh a big 
thing in the 90s when there were still a lot of people selling risk 
architectures but and then it kind of they kind of died out and everything 
everything became x86 and then now there's arm you know big thing uh especially 
if you're using apple computers uh need an arm code generator so john um um 
decided and and john was the one who would have to write a new new code 
generator i've never written a code generator i could but i'm not it'd be a big 
big push for me right um and john decided he would just take an off the shelf 
solution and adopt llvm not to be straightforward for instance uh llvm off the 
shelf doesn't support the calling convention that we use in standard ml big 
things about the calling convention is that uh uh we don't have use a stack uh 
and instead you have essentially continuation closures that are play the role 
of the stack frame and this continuation closures are heap allocated and 
garbage collected and that makes certain things very easy like um concurrency 
or you know multiple threads uh are very easy when they don't have to share a 
stack and move stuff in and out of the stack closures allocated in the heap and 
uh llvm doesn't know about that sort of thing and so details the calling 
convention have to be hacked in to the generic llvm and then there's strange 
bugs that never occur in mainstream uses of llvm that turn up when we when john 
uh tried to apply so it turned out to be a multi-year non-trivial effort to get 
a llvm and of course then the the llvm community and who's ever in charge of 
that is not interested in and you know they probably don't know anything about 
the existence of ml and they're not interested in uh upstreaming uh changes to 
support us and it's you know hundred and some thousand year lines of uh c plus 
plus code so now the compiler is mostly c plus plus and don't program in c plus 
plus anyway so uh um john with great effort has come up with uh an x86 and an 
and an arm uh pair of code code code code generators based on and uh so this uh 
llvm code gets integrated into the runtime system so the compiler produces some 
kind of serialized piece of data and it gets passed over the runtime system and 
uh uses input to the llvm c plus plus code and it has to be stored again into 
the so in the run so the runtime system has passed some sort of thunk maybe 
with some lvm byte code and then that gets executed as opposed to it's just a 
bunch of bytes right it's not it's a uh because the runtime system doesn't know 
what a thunk is it just knows bytes right so it's a pic it's a pickle piece of 
llvm compatible input right so that i was saying it's they passed some llvm 
byte code some you know partially compiled as the the bytes format format of 
the llvm ir and then gets executed you know you know probably um over a dozen 
passes of optimization before you get to the stuff that's actually passed to 
the llvm and then presumably the llvm does its own stuff um and so exactly so 
this kind of violates one of our original principles when andrew and i started 
working on the compiler to show that you didn't need some other language or we 
didn't want to take off the shelf technology and stick it in uh we just wanted 
to have a pure sml solution i would i would really like to get back to that 
interesting well i think we're we're at about an hour and a half so i think we 
can probably wrap it up here but right so if you have any further questions as 
you try to digest this uh just let me know yeah i certainly will i will uh 
thank you so much for for taking the time i'm gonna stop the recording here but 
i appreciate you very much for for taking the time to to talk with me about 
this you're welcome yeah yeah it's fun