I use several programming languages day-to-day, and a few more
languages occasionally. I have a passing familiarity with several
more languages that I uses rarely, as are most students with recent
computer science degrees. They are our basic tools. They are so
basic that I sometimes assume everyone knows about them, until I get
brought up short with a question. What's a lambda function, David?
Uh...
Well. Let me pick a few of my favorite and least favorite languages
and, at least for once, write about them rather than the things I
write with them.
- C
UNIX is written in C. Windows is written in C. I'm having a hard
time thinking of an operating system in current use which was
not written in C. Okay, BeOS may have been written in C++,
and NeXTStep was written in Objective-C; but those both are children
of C, and neither operating system is exactly a major concern any
more.
The C language and the UNIX operating system co-evolved. The first
C compiler ran on a DEC PDP-7 machine, and the language has been
(perhaps uncharitably) characterized as high-level PDP assembly
language.
The language did not really depend on the machine,
though; this seems like a no-brainer now, but at the time the UNIX
operating system was remarkable because it was written at a
sufficiently high level that it could be ported from machine to
machine. The prevailing wisdom before that was that operating
system kernels had to be written in machine-specific assembly
language. The C language, though, worked well -- and works well --
as a systems programming language. C lets us program close to
the metal,
enough so that it's pretty easy to work out the
instruction sequences the compiler will generate for a given piece
of C code (it's not at all easy to predict what an optimizing
compiler will generate, particularly on some modern architectures,
but that inconvenient fact isn't really stressed in lower-division
CS coursework). At the same time, there is enough stuff built into
the language to support a more abstract programming style, one in
which the programmer doesn't think that much about what the hardware
is doing underneath. Also, C is a small language. It often takes
students a long time to understand pointers, or to feel comfortable
with some common C idioms, but it's possible to become acquianted
with most of the syntax in the course of a couple days. The
definitive reference -- The C Programming Language
(Kernighan and Ritchie) -- takes scarcely more shelf space than
the Elements of Style.
By way of comparison, The C++ Programming Language
(Stroustrop) is over 800 pages of smallish print. C is a small
language; most of C's successors are not.
C is alive and well. Standard C runs on almost anything, and it is
still the language of choice for programming operating systems,
embedded systems, and other low-level
things. Some of us
like it for higher-level libraries as well.
I used C for the core of SUGAR 2.0 and 3.0, for CLAPACK, for ECFS
(an Extended Cryptographic File System that I co-authored as a class
project), and to write a kernel for my undergraduate OS class. I
also use it for a variety of small side projects.
- C++
C++ is sometimes lumped together with C in a single breath: C/C++.
However convenient it is to write, and however much some people try
to think of C++ as a better C,
the impression that the phrase
C/C++
conveys is inaccurate: C and C++ are very
different programming languages. It is true that C++ grew out of C,
and that most C compilers now are also C++ compilers. But it's
possible to identify a C programmer writing in C++ for the first
time, or vice-versa. I really learned C++ before I learned C well,
and I was immensely frustrated when I first switched. I know other
people who have had the opposite experience, and I know some people
who manage to really write C in C++. Reading their code is like
reading prose written by a writer familiar with some European
language -- Russian, perhaps -- with a good-but-imperfect command of
English. Even if the syntax is perfect, the idioms get mixed up.
C++ is sometimes called an object-oriented language. The idea of
object-oriented design is that a program is partitioned into
objects
which contain both data and routines to act on the
data. For example, in one program I might use several different
dictionaries: a dictionary object would consist of a data structure
which contained both keys and data -- words and associated
definitions, perhaps, or employee numbers and employee records --
and associated operations like add an entry,
delete an
entry,
or look up a key.
The details of how the data is
stored are hidden, since what's really important is what you can do
with the data; this is the principle known as
encapsulation. For some types of dictionary data
structures, all that really matters is that I need some way to
compare keys: the exact details of what goes into the keys are
secondary. I can use the same code to delete an entry whether I
have keys that are strings (words) or numbers: this is a type of
polymorphism. Or I can take an existing dictionary type
and -- without modifying the original code -- write a dictionary
with new functions, like the ability to guess which word is meant
when the user is a bad speller. I can still use my
dictionary-with-spell-check in any place I use a dictionary; it's
just that I've given it an extra new ability. This is an example of
inheritance.
C++ really doesn't force any sort of object-oriented thinking: it's
just that it provides a syntax that makes it easier to express
(some) object-oriented designs. It's possible to write
object-oriented code in C, too; it's just that there is more
supporting syntax in C++. Even more important than the supporting
syntax is the fact that a C++ compiler will enforce the
semantics of an object system: if you tell the compiler
that you're going to pass a dictionary to some function, it will let
you pass dictionaries and only dictionaries. If you try to pass
around penguins in place of dictionaries, the compiler will yell at
you; in the same situation, a C compiler will sometimes let you
substitute a penguin for a dictionary. Then the penguin will bite
you when you actually run the program. Birds do not take well to
being force-fed employee records.
C++ sometimes gets a bad reputation for being slow, clumsy, and
complicated. It is complicated, but the charge that it's
slow and clumsy is misdirected. C++ can actually be faster
than C, if it's used properly; because of strong typing (the
property that the compiler won't allow you to confuse a dictionary
with a penguin), the optimizer has some opportunities that it
wouldn't otherwise have. The trouble is that in order to write
really fast code -- in any language -- it's necessary to
have a good mental model of how the programming language structures
are actually implemented. C++ supports much more complicated
constructions than C does, and it takes more time and training to
really understand not only the semantics of those added features,
but also to understand the implementation well enough to know how
much the features will cost. Similarly, object-oriented methods are
a supplement to -- not a substitute for -- good design taste. It's
possible to make a bad generalization which causes confusion and
extra work with little benefit: for example, partial differential
equations often behave very differently in 2D than in 3D,
and a framework in which 2D and 3D problems are treated too
similarly will lead to headaches. But some things about 2D and 3D
problems really are the same, and should be treated with the same
code. How do you tell whether to treat 2D or 3D the same or
differently? It's a matter of good taste, and that's a hard thing
to teach.
I used C++ at my job at Tri-Analytics (and later at Digital
Innovation) during my high school and college years. I have used it
for various other projects since. The core of the HiQLab simulator
is written in C++.
- FORTRAN
FORTRAN 77 was standardized in 1977, the same year that I was born.
I sincerely hope that FORTRAN 77 dies out before I do. The language
does support complex arithmetic -- something that C lacked
until very recently -- and often it's easier for an optimizer to
generate fast code from FORTRAN 77 than from C. But the language
syntax is outmoded and irritating. Why should I have to format my
program for a punch card? And why should I have to restrict the
length of my function names to six characters? FORTRAN also has
very weak type checking; think penguins and dictionaries again,
except now I can get away with calling a penguin an integer, which
even C will disallow. Also, FORTRAN 77 provides no facilities for
dynamic memory allocation: if I make a dictionary, I have to make a
maximum size at compile time. It's possible to get around
this limitation (and doing so was one of my major contributions to
the FORTRAN-language finite element code FEAP), but it takes a
degree of cleverness and some mental gymnastics. FORTRAN 77 does
not support recursion, does not have pass-by-value arguments, does
not have pointers (or any good substitutes for pointers), and
generally does not have the modern amenities.
I like having the modern amenities. I enjoy running water and flush
toilets; and I enjoy dynamic memory allocation and long function
names. I do not generally think either preference is a weakness.
That being said, there is a lot of well-designed numerical
code in the world which is, alas, written in FORTRAN 77. And I am
willing to use and modify such codes, which means that I am willing
to continue to use FORTRAN 77.
There are more recent dialects of FORTRAN than F77: we have F90,
F95, FORTRAN 2000, and now FORTRAN 2003. I actually rather like
these languages, though they are more complicated than the older
FORTRAN dialects, and they show some of the symptoms of committee
design. I'm not sure why these languages haven't caught on in the
US, but so far they seem to be more popular in Europe. I see only
one reason to continue using FORTRAN 77 rather than a more recent
dialect: the free GNU compiler, which a lot of people use and like,
currently only supports FORTRAN 77. But that will change
with the next major release of the GNU compiler suite, which pleases
me immensely.
The FORTRAN language is dead. Long live the FORTRAN language!
- MATLAB
MATLAB is originally short for MATrix LABoratory, and the language
is ubiquitous among those of us who do any sort of matrix
computations. Not only is MATLAB often used to develop numerical
algorithms, but MATLAB's syntax is often used in a purely
mathematical context to express indexing operations which are common
in numerical linear algebra. In addition to providing a concise
syntax for linear algebra operations, MATLAB provides basic
libraries for a variety of numerical tasks, including numerical
quadrature (computing integrals), numerical integration (computing
solutions to ordinary differential equations), and graphics and
plotting in many flavors.
Of course, there aren't that many numerical linear algebra
specialists in the world, and most of us don't make that much money;
the MathWorks makes most of its money from electronic designers, and
particularly from DSP designers. Besides the basic capabilities to
which every MATLAB user has access, the MathWorks sells toolboxes of
routines for DSP design, filter design, control theory, system
modeling, statistics, symbolic math, and PDEs. There are also a
variety of third-party toolboxes (both free and commercial).
I love MATLAB for its matrix syntax. I like the fact that it
includes a wide range of numerical routines. But I'm not so
attached to the rest of the language. MATLAB only allows one
globally-visible function to be provided per file; that quickly
becomes annoying for large programs. MATLAB does have explicit
language support for OO design, but it's awkward. And if MATLAB
strings seem well-designed compared to strings in C and FORTRAN,
that's only because string manipulation in C and FORTRAN is so
miserable.
Almost every numerical code I've written, large or small, has
started life as a MATLAB program. My MATLAB interface to FEAP
(FEAPMEX) and the SUGAR and HiQLab simulators all combine a
computational core
in some compiled language with a MATLAB
user interface. I've also used a mixture of MATLAB and compiled
languages in my work on structured eigensolvers and in my work on
network tomography.
I sometimes use Octave,
which is a free numerical system with a MATLAB-like syntax; but
Octave only recently added decent sparse matrix support (which I use
a lot), and its graphics aren't as nice. So I still mostly use MATLAB.
- LISP
LISP is a LISt Processing language; or is it Lots of Irritating
Strings of Parentheses? In any event, LISP is one of the oldest
high-level lenguages. It is also one of the simplest. LISP
programs work on lists, or lists of lists nested in various ways.
LISP programs also are lists themselves: the main data
structure is not only used to represent data, but also the
instructions that operate on the data. Consequently, for a LISP
programmer it is very natural to write a function which takes a
function as input and produces another function as output. Deep,
no? The elegance and simplicity of such a language is rarely
appreciated by students who learn Scheme (a LISP dialect) in a first
programming course.
At a superficial level, LISP confuses the uninitiated because it is
a prefix language. For instance, in LISP I would write
1+2*3 as (+ 1 (* 2 3)): the operator comes
before the operands, not between them. If Shakespeare
wrote in LISP, Hamlet might have said (or to-be (not
to-be)) (and in a post-fix language like Forth, it would be
tobe tobe not or). The difference between prefix and infix
notation is not deep, though, and most people figure it out pretty
quickly, particularly if they ever used an HP calculator.
At a deeper level, LISP is a functional language. A
well-written LISP program typically will not involve assignments and
loops; instead, operations are expressed recursively. If I wanted
to sum the numbers from 1 to n in LISP (and didn't remember
the formula n(n+1)/2), I might write something like
(defun sumit(n)
(if (eq n 1)
1
(+ n (sumit (- n 1)))))
That is, if n is 1, then the sum is 1; otherwise, the sum is n plus
the sum through n-1. In contrast, if I was writing in an
imperative style in a language like MATLAB, I might write
function sum = sumit(n)
sum = 0;
for k = 1:n
sum = sum + k;
end
It's possible to write in a functional style in a language like C,
and it's possible to write in an imperative style in LISP. As with
the distinction between C and C++, the difference is not in what the
language can do; it is in the idioms to which the language
lends itself. The same can probably be said of the technical jargon
of different disciplines, but that's a topic for a different time.
I used LISP in my Artificial Intelligence course (like every other
person to take such a course). I also use Emacs LISP from time to
time, and on a few occasions I wrote short functions in a LISP
dialect which was used as one of the configuration languages for
Collector (the flagship software product produced by the company I
worked with in high school and college).
- Pascal and Delphi
I took AP Computer Science in high school -- sort of. You see, I'd
already learned to program at the time, and it quickly became clear
that it would be pointless for me to walk through the usual
exercises. So I got to pursue my own projects in a semi-independent
study; and I helped the other students in the class, which was also
entertaining. The class was still taught in Pascal -- and the AP
exam was still given in Pascal -- so I taught myself Pascal. I have
few things to say about it, except that the support for character
strings was pretty lame (again), and that I always found the C style
of using semicolons to end statements to be more natural than the
Pascal style of using semicolons to separate statements.
Pascal was an adequate language when I used it, and I suppose it's
fine for instructions. But Delphi was a completely different
matter. We used TurboPascal for class, and whatever I thought of
the basic Pascal language, the Turbo compiler was a delight -- and
the folks who wrote the Turbo compiler extended the language in a
good way (with better string support, for one thing). But Delphi!
Delphi was -- is -- Borland's answer to Visual BASIC. The
programming language is an extended version of Pascal which bears
roughly the same relation to standard Pascal as C++ bears to
standard C. Like Visual BASIC, a lot of Delphi programming was
centered around GUI-driven events: click on this button and that
piece of code is executed, for instance. But Pascal is a nicer
programming language than BASIC, and the whole business was so much
cleaner in Delphi that there was hardly a comparison. The Delphi
compiler, like the Turbo compiler, was amazingly fast; the code it
generated was pretty fast, too. As far as I could tell, it was in
almost all ways technically superior to VB.
I used Delphi to write user interface code, both for Digital
Innovation and for the Department of Communication Services (DCS) at
the University of Maryland.
- Lua
One of my favorite languages for the past couple years has been a
little scripting language called Lua. The name comes from the
Portuguese word for moon
; the language and the name evolved
from SOL, a Simple Object Language (sol
is sun
in
Portuguese). SOL, in turn, was a descendant of DEL, a Data Entry
Language. DEL, SOL, and Lua were all inspired by the needs of
engineers to set up some sort of petroleum engineering simulations,
so one of the criteria in the minds of the designers was can
someone who only knows FORTRAN 77 pick this up?
At the same
time, the language was designed by computer scientists, and it has a
number of lovely and useful features: functions as first-class
objects, syntactic sugar to support object-oriented programming, and
a very natural table-based data description syntax. At the same
time, it's not a complex language: the complete syntax description
fits on one page, and the language manual -- including the
description of the extension API and standard libraries -- is about
sixty pages. The interpreter is small, fast, and portable.
One of the nice things about Lua is how well it serves as an
extension language. Python, Perl, Tcl/Tk, and various
other high-level scripting languages are easy to extend, but they
are not as easy to use as extensions. Size is one issue: Python,
Perl, and Tcl are not little languages. Syntax is another
issue: when I start programming in Perl after a hiatus, I always
forget half the dollar signs; and in Tcl, I always forget which sort
of braces I should use. There are a few Scheme interpreters that
are billed as extension languages, and from personal experience I
know that I like Scheme for extensions. But -- the
examples of AutoCAD and CADENCE notwithstanding -- I'd rather not
spend my time teaching mechanical engineers how to use LISP.
Besides, I like infix notation. Lua wins because it supports a
variety of programming styles; because it is easy to call from C and
can easily be extended from C; because it has an intuitive syntax;
and because it's small and fast. I think the language
underappreciated, though some of the video game industry has
realized how nice Lua is: Lua is used to script the most recent
Monkey Island games, among others.
I use Lua in the HiQLab simulator both to describe my meshes and to
script out analysis tasks. I used Lua for the same purpose when I
designed SUGAR 3, though at that time I felt obliged to extend the
Lua syntax in order to support some features I'd built into the
input language I wrote for SUGAR 2. Let me give an example of why
Lua works so nicely as a mesh description language. When you solve
a differential equation -- partial or ordinary -- one of the things
you always have to worry about is the behavior at the boundary.
In HiQLab, I have a very flexible way to define boundary conditions:
I write a Lua function to do the job. For instance, the code fragment
mesh:set_bc(function(x,y)
if x == 0 then return 'uu', 0, 0; end
end)
tells my code that all of the points along the line x = 0
in my two-dimensional mesh should not move. The 'uu' means that I'm
specifying displacements in both the x and y
directions; the two zeros are the values of the displacements.
I can write my code this way because Lua treats functions as first
class values: I can define a function and pass it as an argument to
another function, just as I can in LISP. Also, I've defined an
object-oriented interface for my geometry definitions: I have a mesh
object (called mesh, strangely enough), and that object has
a method called set_bc.
I generate the interfaces
between Lua and C/C++ code using an interface compiler that
generates the glue
code I need from a cleaned-up header file.
I liked the features of the tolua interface compiler well enough
that I added some of them to the matwrap interface compiler that I
use to generate interfaces between MATLAB and C/C++ code.
One summer at Digital Innovation, a co-worker and I worked on an
interpreter for a language called EEL (Expression Evaluation
Language) which we hoped would replace the LISP dialect used to
script the Collector database system. I think Lua is what we might
have hoped to design, but didn't quite achieve -- though when I
mentioned this to Guy, he thought that EEL would still kick ass. I
suppose it's a moot point, now.
- AWK and Perl
AWK was the first language I learned with really good text
string processing. I suppose it's reasonable that I would be
impressed by AWK's text processing: after all, the whole point of
the language is pattern scanning and text processing. I use AWK for
a lot of little things, like massaging text files full of data as I
pass them from one program to another. I have a pair of shell
scripts that I use a lot, awk-process and
awk-unprocess, which let me quickly apply complicated edits
to several programs and then undo those edits if I make a mistake.
Perl (Practical Extraction and Reporting Language
or
Pathologically Eclectic Rubbish Lister,
depending who you
ask) includes the capabilities of AWK and much more. However, I
rarely write an AWK script that's more than a page, and so I rarely
find I want the extra capabilities I could get from Perl. I know
enough Perl to be able to read and edit other people's code, but I
write almost as little new Perl as I write new Fortran 77. If I
want to write new scripts and I need something more than AWK, I'll
use Python. Why Python? Because, unlike Perl, it doesn't look like
someone accidentally transcribed modem line noise onto the screen.
Perl's very powerful, but the syntax is something only a camel could
love.
I suppose Tcl/Tk is also an option for the same sort of high-level
scripts that I might write in Perl or Python. Tcl/Tk is an older
scripting language; its best point, in my opinion (and in the
opinion of some others) is its integrated GUI toolkit (which is what
Tk is). Tcl/Tk makes it possible to very quickly write graphical
applications for Windows or UNIX, and I used it to make a graphical
user interface to the program I wrote for my undergraduate research
(TkBTG, a code for statistical prediction of spatial data). But
while Tk is nice, I always had trouble debugging Tcl. You see, in
Tcl there are specific meanings associated with {}, (), and []; and
at critical times, I kept forgetting which was which. That said, I
know that OpenSEES -- a locally-devloped open-source simulator for
determining the response of structures in earthquakes -- uses Tcl as
a scripting language.
- PostScript
It seems only fair to end with PostScript. PostScript is primarily
a document description language, but it's a full-featured language
which is capable of more than drawing documents (though I think
there's a certain perversity involved in using PostScript to write a
SETI@Home client, for instance). PostScript is the lingua franca
for UNIX printer drivers, and many high-end printers will accept
PostScript directly.
PostScript is a postfix language. That is, instead of putting the
functions and operators before the data on which they operate (as
happens in LISP), one puts the operators after the data. The
PostScript interpreter maintains a stack, and whenever it sees an
operation -- like +
-- it will take an appropriate number of
input operands off the top of the stack and push a result onto the
top of the stack. The fact that the interpreter hardly needs
anything more than a stack and a table of variable definitions means
that the interpreter can be pretty simple. Also, because the
interpreter can usually do something immediately when it sees a word
-- like push something onto the stack, or run an operation -- a
PostScript interpreter needs to include only a minimal parser.
To give an idea how this all works, let me give a simple example.
The following program prints the numbers 1-20 in a column running
down the side of the screen:
/Times-Roman findfont 15 scalefont setfont
1 1 20 {
dup % stack: n, n
-18 mul 720 add % stack: n, -18*n+720
72 exch % stack: n, 72, -18*n+720
moveto % stack: n
3 string cvs show % stack:
} for
showpage
The first line sets the font (15 point Times Roman), and the last
line shows the page. The lines in between are the interesting
part. In PostScript braces denote a procedure: these are the only
things that aren't immediately processed in full when the
interpreter sees them. Here, the interpreter pushes the loop bounds
(1 through 20 in steps of 1), then the loop body procedure. Then it
calls the for command, which says pop three elements
from the stack and use them as loop bounds; for each value of the
index (from 1-20, here), push the index value on the stack and then
execute the procedure inside.
The procedure inside computes a
position on the page (the point 72,-18*n+720), moves there, and
prints a three-character string containing the value of n.
If you think that's intuitive, you've either programmed in
PostScript before or your mind has been warped by too much time with
an HP calculator. Or, possibly, your mind works better with postfix
notation than mine does. Postfix languages are attractive because
they are simple to implement and because it's simple to
automatically generate code for them -- not necessarily because it's
simple for a human being to write code for them.
Not only are stack-based interpreters simple to implement, but it is
also simple to generate code for them. Most of the time, I only
work with PostScript through automated tools (sometimes produced by
others, sometimes deviced by me). But the idea of a stack-based
machine isn't good just for generating printer commands. The Java
Virtual Machine is stack-based; so is the virtual machine that Lua
uses internally. The x87 floating point coprocessors and the
integrated floating-point units in some later Intel chips are
stack-based; this last is a mixed blessing, but that's a story for
some other day.