Sunday, August 14, 2005

Starts with P

  • Public transportation

    I visit Winnie in San Jose most Saturdays. The full trip proceeds in three legs: a BART ride to Fremont, a bus ride to a transfer station at the Great Mall transit center, and a light rail ride to a stop near Winnie's apartment. Depending on how long I spend waiting between each leg (and whether I meet Winnie at her apartment or at Great Mall), the trip takes between two and three hours. Usually it takes about two and a half hours, which means that on Saturdays I spend a total of five hours or so riding the local public transportation.

    I do different things in those five hours, depending on my mood and on the contents of my backpack. Some days, I read; some days, I listen to the radio; some days, I sleep; and some days, I stare out the window and ponder. When I walked out the door this morning, it felt like a reading day, so I tucked David Copperfield into my bag.

    I didn't touch David Copperfield after that. Here's the story of what I did instead.

  • Projections

    Pretend for the moment that you're interested in the solution to a large system of equations. The system is so large that it's inconvenient to do anything direct -- there are just too many equations and too many unknowns. What do you do to get an approximate solution?

    There really is only one thing to do: get rid of some unknowns. Perhaps you know a little in advance about your solution, so that you're able to say in advance that some components are going to be almost zero; so you pretend that they are zero. Now, if you started with equal numbers of equations and unknowns, you create a problem by assuming some of the unknowns are zero: there are too many equations, now, and they can't all be satisfied. So you throw away some equations, too, until you have an equation that you can solve again. And this gives you an approximate solution, which you can then try to evaluate or solve.

    Now, you can usually rearrange the unknowns in your problem before doing anything else, so that you really have some confidence that the unknowns you discard will be almost zero. And you can usually rearrange the equations, too, so that the equations you discard are the ones that don't tell you much. For example, your original system of equations might be the stationarity conditions that have to be satisfied at a minimum of some energy (i.e. the conditions that say derivative equals zero at an extremum); in this case, your reduced set of equations might be the equations that minimize the energy subject to the constraint that the components you threw away are zero. But the idea behind this method of approximation, of assuming something about the solution so that you can throw away a bunch of the equations and unknowns, is very general. The high-level idea goes under many names, but I usually call any method of this form a projection method (or sometimes a Galerkin projection method). And projection methods, together with perturbation methods, make up a toolkit that I use almost constantly.

    Projection methods ar used for solving partial differential equations (which are infinite-dimensional problems), as well as for solving large finite-dimensional linear systems of equations. If you've heard of the finite element method, or of the conjugate gradient method, they're both examples. If you've ever used a truncated Fourier series to approximate an ODE or PDE solution, you used a projection method there, too. The cool thing about classifying these methods together is that you can analyze lots of different methods in a unified framework -- which, if you believe in the power of carefully directed laziness, is a very good thing.

    If you don't believe in the power of carefully directed laziness, then neither mathematics nor computer science is probably the discipline for you.

    Anyhow. The task of analyzing any of these projection methods really can be partitioned into two sub-tasks. First, one figures out how nearly the approximate solution matches the best approximation consistent with the simplifying assumptions: how well do you approximate the unknowns that you didn't throw away? Then one figures out whether the approximating subspace contains a good estimate for the true solution: are the unknown values that you discarded really small? The second step is problem- and method-dependent, but the first step can be done in a general setting.

    At least, the first step looks like it can be done in a general setting. In practice, you can't get very far without using some additional hypotheses. The strongest results hold when the system of equations is symmetric and positive definite (in a PDE setting, where positive definite systems can nevertheless be arbitrarily close to singular, you need coercivity). If the equations are nonsymmetric but you still have coercivity, then you can say a little less, but the result is mostly the same. But once you've lost coercivity, suddenly there's much less you can say. If you make an additional assumption (called an inf-sup or Babuska-Brezzi condition in the PDE setting; a lower bound on a minimum singular value in the finite setting), then you can still prove something, then you can at least prove that your approximate problem has a unique solution which is within some factor of being optimal -- but that factor may be much larger than it was in the positive-definite case. Besides, proving an inf-sup condition is often involves problem-specific work, which is work that we'd like to avoid as much as possible in the interest of constructive laziness.

    There's something bothersome about changing to a much weaker bound the instand that coercivity is lost. There are cases when a minor change makes the difference between an easy problem and an impossible one; but this is not such a case. For the problems I think about, a mild loss of coercivity should not be a mortal sin. How can I develop error bounds which gradually transition between the theory that holds in the coercive case and the theory that holds for general indefinite problems?

    This is what I was thinking about for a big chunk of the train rides today, and I think I have a reasonable answer. It generalizes and unifies a hodge-podge of results that I know about in the finite- and infinite-dimensional case, and it involves some mathematical ideas that I find very beautiful (involving perturbation theory and the behavior of the field of values of an operator). I doubt any of it is new in a fundamental way, but I'm not sure these results have been cleanly tied together before. At any rate, I'm excited about it.

  • Polya

    There was already a book in my backpack when I threw in David Copperfield. George Polya's How to Solve It is just as classic as Copperfield (though in a different setting). It's also much shorter, which is why, after pondering projections for a while, I chose to read it over the Dickens book. How to Solve It is about how one goes about solving mathematical problems -- or other problems, for that matter. The techniques that go into such problem-solving are collectively known as heuristic (which is distinguished from the usual definition of heuristic reasoning, since part of the problem-solving process is the rigorous argument that comes after the initial attack). Polya wrote particularly for teachers of mathematics, and I admit that my decision to buy my own copy was partly influenced by a quote from a review:

    Every prospective teacher should read it. In particular, graduate students will find it invaluable. The traditional mathematics professor who reads a paper before one of the Mathematical Societies might also learn something from the book: 'He writes a, he says b, he means c; but it should be d.

    -- E.T. Bell, Mathematical Monthly, December 1945.

  • Practical Common Lisp

    I have a few Lisp books on my shelf, of which I only really use one: ANSI Common Lisp by Paul Graham. The other books aren't bad, but they're more dedicated to how to program in general, as opposed to how to program in Lisp. But there's something limiting about having just one book on a language, just as there's something limiting in only having a single dictionary, a single history book, or a single book on almost any other topic. The local book stores stock two Lisp books: Graham's book, and Practical Common Lisp by Peter Siebel. After discovering the latter book on the store shelf a couple months ago, I found that it was available online as well, and so I started reading it.

    It is an excellent book, one that I've already found very useful. For example, the chapter on Lisp packages and symbol resolution (Chapter 21) includes a section on Package Gotchas, which, in the course of about a page, revealed the source of the woes I'd been having with my attempts to use a Lisp lexer generator that I grabbed from the net. Excellent! Siebel also has good descriptions of the Common Lisp format and loop constructs, better than the corresponding descriptions in Graham's book (probably because neither construct is very Lispy).

    I am not good at resisting the call of good books. I got a paper version when we went to the book store.

    I spent some time on the return bus ride reading through the first several chapters of the book. At the end of the ninth chapter, there's a short section entitled Wrapping Up, which begins with a brief summary of the code in the chapter, and then goes on to review the development process:

    It's worth reviewing how you got here because it's illustrative of how programming in Lisp often goes.

    Illustrative of programming in Lisp? Illustrative of programming in general! Or, for that matter, illustrative of how one approaches a broad variety of problem-solving tasks. Remove all the specific references to Common Lip and rewrite the text in Polya's distinctive style, and the last page of the chapter could come directly from How to Solve It.