Tuesday, October 04, 2005

Understanding and reviewing

In How to Solve It, Polya lists four steps for mathematical problem-solving: understand the problem, devise a plan, carry out the plan, and then look back. It seems like I spend most of my time -- or at least most of my interesting time -- in the understand the problem or the look back phases.

In some sense, I spend way more time in the phases of initial understanding and later reflection than in any of the other phases. I spend years on them! The thing is, devise a plan and carry out the plan are crucial components in solving individual problems. But figuring out a problem that's simultaneously interesting and tractable -- ah, this is not so much a matter of planning as a matter of observation.

I've already mentioned my experience with geometry: took a graduate Riemannian geometry course, felt completely bewildered, came out... and got it over the next several years in a series of ah! moments (where moments should be very loosely interpreted as time periods ranging between a second and twelve hours). I'm less sure if I've mentioned other such experiences. The time between initial exposure and any personal feeling of competence is often a couple years, at least: for linear algebra, for numerical analysis, for functional analysis, for ODEs, for PDEs... the list goes on. But what really constitutes a satisfying feeling of basic competence? I don't think it's the ability to ace an exam, or even to do difficult problems; those are side-effects at best. I've received good grades in all my mathematical courses, really; I saw stuff, I did the problems, I put together the pieces in my head, and at the end of those courses, I knew the material as well as anyone could expect. But there's a difference between knowing something at the end of a semester and knowing something so well that you can use it intuitively in a complicated argument, so well that it's not just something that you remember, but something you couldn't possibly forget.

Of course, effortless recall of mathematical concepts is no more sufficient to do interesting mathematics than effortless use of English vocabulary is sufficient to craft good prose. It's a starting point. It is, to me, the real meat of the understanding the problem phase. And to do it right either takes someone much smarter than me, or someone willing to devote years of thought -- though not continuous thought -- to really grok the concepts. Sometimes I run into people who claim expertise before they've spent the time to get to what I would consider a starting point (for whatever reason this happens more in CS than in math), and who are undone by a few simple questions. I've been in that position, too! But I feel a little bad when I see it happen to someone else, and then I see that person walk away still feeling like an expert. It's not so much that fake experts hurt others, or that they hurt the reputation of their field of expertise, though unfortunately such things do happen; but when I've met such folks, they seem to have a certain smugness which inures them to the feeling of accomplishment which only comes (for me) after years of feeling like the brain knows what's going on, but the guts haven't quite caught up.

For the record: I consider myself expert in certain aspects of numerical software development. I would grudgingly grant that I might become an expert in certain aspects of numerical linear algebra. There's no way that I'd let you call me an expert in Riemannian geometry; it's just that I've spent enough years with certain ideas floating around in my mind -- things like manifolds and atlases, differential forms and vector bundles and tensor bundles -- that I feel a certain comfort with the material. And the same is true of other things.

Looking back is just as important, and just as time consuming, as the initial effort of understanding mathematical ideas. The initial understanding and the looking back blur together. Let me give an example, one that partly inspired this whole line of thought. One of the first topics covered in a compilers class is lexical analysis: how does one break up a string like 1 + a12(b*c) into meaningful tokens, things like the number 1 or the identifier a12? The usual way of approaching this is through finite automata: one starts with regular expressions, which are translated into nondeterministic finite automata, which are then converted into deterministic automata, which are then compacted. Then you generate some executable representation of the automata -- a state table and an interpreter, perhaps -- and away you go, tokenizing files with gleeful abandon. Or maybe you just say to heck with this, and do it in an ad-hoc way, in which case you'll probably re-invent either state machines or recursive-descent parsing. For the lexer the students were required to write, we gave them a code base which was written in a recursive-descent style; but it turns out that a DFA can be converted directly into an LL(1) grammar, so... six of one, half a dozen of the other.

So all these ideas come into a seemingly simple task of chopping up a file into individual tokens. And it's all very beautiful, but it can be sort of confusing, particularly since the students were really writing an LL(1) parser (a recursive-descent type parser) before they were ever formally introduced to the ideas of LL(1) parsing. So one day, as we were walking out of section, one of my students asked why should we learn the NFA -> DFA algorithm, if we're not coding it directly, and if we there are well-established tools that will do such conversions for us? It was a fair question, but while my brain was sorting among the reasons I could give, my gut interjected. If you can understand NFA -> DFA conversion, I said, then you'll be able to understand the construction of state machines for LR(k) parsing. And you want to understand the construction of those state machines, even if you're using an automated tool; otherwise, you'll have no understanding of how to avoid conflicts when they occur. My brain agreed that this was a pretty good reason, and started to expand; but by that time the student had thanked me and we'd parted ways.

Guess what? Before he asked, I'd never really thought about it before. But it really is true. The technical point: the NFA -> DFA algorithm and the LR(1) parser construction algorithm are similar in flavor, in that both involve manipulating states that represent sets of things (sets of NFA states in one case, and sets of productions in the other). Further, you can mimic the effects of the NFA -> DFA algorithm by converting an NFA into a certain type of grammar and then generating an LR(k) parse table.

And stuff like this keeps happening! But sometimes it goes off in odd directions. I write an LL(1) parser generator, I learn something interesting about Lisp. I try to come up with a non-trivial regular expression for a class exercise, I end up reminding myself of some interesting number theory... and then wishing that I had time to go learn a bit more about analytic number theory, since I've been thinking about analytic function theory in a different context. I think about code optimization shortly after thinking about a problem in floating point error analysis, and I find myself thinking gee, I could probably write an optimizer that new my favorite floating point error analysis tricks, and was able to generate code for maximal accuracy rather than for maximal performance. Oh, so many wonderful connections there are! The best thing is that I've learned that I can mention some of these things in class, and at this point most of the students seem to understand what I've said -- and a few of them get curious enough that they start playing with the ideas and making connections and asking questions. O, brave new world, that has such people in it!

But what is this activity, really? It's not problem-solving, really, though sometimes it results in good problems. It's something of a hybrid: exploration to discover new problems, or to discover how to understand problems, followed by reflection to steer further exploration. Were I a poet, I might mention the phoenix here, with new attacks and approaches rising from the ashes of old problems. But I am not a poet, so I shan't mention the phoenix. Except I've mentioned the phoenix just in order to bring attention to what I'm not mentioning. So am I waxing poetic?

Ah, the joy of it all!