Friday, January 16, 2004

I spent Wednesday of this week writing documentation for my Matlab-FEAP interface and pondering my software toolkit. The documentation is generated automatically in part; there is a script to build the table of contents, another script to generate the list of function references from my Matlab comments, and yet another script that generates all the graphics. The scripts are a few lines each, written in AWK and Matlab and shell.

When programming, I write a few these little scripts each day. They do things like produce header files for me, or rewrite function calls to change the order of arguments, or change a naming convention I've decided I don't like. Most of them fit on 5-10 lines, usually in AWK. Perhaps I should learn to be more comfortable with Perl, but it is hard to beat AWK for clean one-liners. I write Matlab scripts, AWK scripts, and shell scripts, as well as Makefiles and the odd macro in M4 or the C preprocessor. All this, of course, is in addition to the code I write in compiled languages: predominantly C, C++, and Fortran.

I've thought a lot recently about my choices of languages and tools. My Linux environment is very programmer friendly. That's one of the reasons I switched to Linux -- eight years ago, now. With Linux, I had access to free compilers, build tools, version control systems, and so forth, in a flexible environment very similar to the one we used for coursework. And the situation has improved a lot since then. The Fedora installation has caused me no troubles, and I've had the opportunity to trade time spent mucking with details I don't care about -- such as the syntax of yet another configuration file in the /etc subdirectory -- with tasks that I care about and enjoy.

Still, there is a long way to go. I still spend too much time trying to build numerical software, often in the face of utter frustration. The make tool, which is the de facto standard of software building today, was invented early in the life cycle of UNIX. Autoconf, a powerful and cryptic tool for configuration detection and setup, is written in M4, a preprocessor which first saw use as a preprocessor for Fortran. Every compiler takes different flags, or has different quirks in library support, and every tool is documented in a different format: TeX or troff or HTML or DocBook, or any or all of the above. Building software under Windows is sometimes like starting from scratch; while Windows supports a make program of its own, Windows make is a somewhat different beast from its UNIX sibling. I have powerful debugging tools like Valgrind under Linux -- but they don't work when I need to develop Matlab extensions, or when I need to debug my code under Windows. And, with a timing that seems to depend on the phase of the moon and the alignment of the planets, sometimes everything breaks. Would-be authors of portable code are reduced to the lowest common denominator they can bear -- and sometimes that denominator is technology from the 1970s with a thick layer of GUI painted in peeling layers on top.

All this has motivated my continued use and exploration of scripting languages. I use Matlab for much of my day-to-day numerical work; as a language, it leaves some things to be desired, but it is adequate for many applications -- and it's hard to beat when it comes to numerical linear algebra. I use Lua to script SUGAR, the MEMS simulator I work on. Lua is fast and small, but it is also very flexible, and I've spent a lot of time gloating over things I can do with it. Recently, I've increasingly been looking at Python, which is increasingly popular among scientists at the labs (who use it for steering their codes), system administrators, and web developers. It has a simple high-level syntax, it's compiled, and it has a big library. If I want to run my codes remotely and interact with them via a web interface, it will be a lot easier for me to write the interface in Python with compiled hooks into code that I provide as a dynamically loaded library.

I want to spend time on the problems I want to spend time on. I don't want to spend (much) time writing hash tables in C, fighting with broken build systems, or figuring out how to port my code to platforms that don't obey a relevant standard. I don't want to type ten lines of code to beat a behaviour out of Java that I could get out of ten characters of LISP. I don't want to reinvent wheels. I don't want to spend more time than I must in debugging. I do want to spend time thinking about software design and algorithmic strategies, and about how to efficiently pose and solve computationally difficult scientific and engineering problems. I'm not alone in feeling this way.

The state of the art advances, and I get to work every day with some truly amazing hardware and software systems. And at the same time, we've got a long way to go.