Wednesday, July 06, 2005

Code Tomography

This is part three in a planned five-part sequence on some interesting applications of linear algebra to tomography for computer networks, programs, and a yet-to-be-decided physical example. Today: how to get more utility from conventional code coverage tools by analyzing a matrix that traces out the different execution paths exercised by a test suite.

If you compile programs with GCC and don't know what code coverage is, look at the manual page for gcov. The introductory section is written in a much clearer style than I usually associate with the Unix man pages. Then come back and read on.

If you don't have access to the gcov man pages, or don't want to bother reading them, here's the short version. gcov is a coverage testing tool. You compile your program with certain options, so that when you run it, it writes out a bunch of trace information to a file. With this trace information, you can use gcov to produce an annotated listing of your program which says how many times each line has been executed. Software testers sometimes say that a test suite has complete coverage if every line of code in the program has been executed at least once.

Obtaining complete test coverage is difficult, particularly if the programmer conscientiously checks every status code returned by every call she makes to a library or to the system. How do you test that the program gives an appropriate error message when the system runs out of memory, or when the user decides to pull the disk out of the drive in the middle of a save operation? Furthermore, it is easy to construct obviously flawed programs which pass test suites with complete coverage; the suite exercises every line of code, but not every possible path through the code.

Paths through the code. Does this start to sound familiar?

Suppose that I write down traces of program executions as vectors of numbers: x_i is the number of times line i was executed in a particular run. If I have traces from many runs, then I can assemble them into a matrix G, where G_{ij} is the number of times line i was executed in run j. In linear algebraic terms, total coverage can be represented as the fact that Ge has no nonzero entries, where e is a vector of all ones.

Does that sound like a contrived way of expressing the idea of total coverage? It should! Total coverage is useful, but in some ways its a very shallow metric of how good a job a test suite is doing. There are other, more natural, things that we can do with the matrix G which give us more subtle, and potentially more useful, information.

First, consider a few statements that one might make about a program flow:

  • A is executed if and only if B is executed
  • Either C or D must occur once in the code; they cannot both occur
  • When E occurs, one of F or H must follow

These statements reflect certain invariant properties of the program flow. Unsurprisingly, the trace vectors reflect those invariants; they appear as relations between the rows of G; for all j, we have

  • G_{Aj} = G_{Bj}
  • G_{Cj} + G_{Dj} = 1
  • G_{Ej} = G_{Fj} + G_{Hj}

By writing down the traces in the form of a matrix, and expressing invariant properties as linear dependencies among the rows, we open the door for automatic analysis. In particular, the first and the third invariants above correspond to left null vectors for G: if e_i is the vector which is zero everywhere except in component i, then I know that (e_A - e_B)^T G = 0 and (e_E - e_F - e_H)^T G = 0. The second invariant is a little trickier, but I can detect it if I artificially add a row of ones to G (a useful trick, since the case of mutually exclusive execution is pretty common). Now suppose G represents the traces of program runs in a test suite, where everything went fine, and x represents a program run which consistently crashes. If I have some vector v such that v^T G = 0 but v^T x \neq 0, that suggests that I should check whether there is some unusual interaction involving the statements corresponding to nonzero entries in v.

Dependencies among the rows of G tell me something about invariants among the test case runs (which may or may not be invariants among all possible program runs). What about dependencies among the columns of G? These are just as interesting, because they say something about a deeper form of coverage. Suppose G represents a test suite with complete coverage, and x is a trace vector which is independent of the columns of G. Then even though the new run doesn't exercise any individual lines of code that weren't exercised by the test suite, it very likely exercises a path which wasn't previously exercised.

It's possible to vary the definition of G in order to shine light on different parts of the program structure. In the next installment, I will discuss some additional applications of execution path matrices to problems in software engineering.