Saturday, June 11, 2005

Programming and Duality

Vector spaces are like potato chips. You cannot have only one.
W. Kahan

One of the key ideas in linear algebra is that of a dual space. If V is a finite-dimensional real vector space, then the dual space V* consists of all linear functions from V into the real numbers. A function from V into the real numbers is sometimes called a functional on V, so we also sometimes say that the dual space is the space of linear functionals on V. If V is a finite-dimensional inner product space, then there is a natural map between ordinary vectors and dual vectors: every function f in the dual space V* may be written as

f(x) = dot(y,x)
for some unique choice of y in V. This is a special case of the Riesz representation theorem.

Dual vectors can be identified with vectors in V, but they are not the same. Usually, we remind ourselves of the difference by writing vectors and dual vectors as columns and rows of coordinates, respectively (if you're a physicist, you write kets and bras; however, I was not exposed to Dirac notation until after my formative years, and tend not to use it except when making puns). The difference between vectors and dual vectors matters as soon as one tries to change coordinate systems, for instance. If you have ever seen the multi-dimensional chain rule, you have seen this distinction: if F is a nonlinear function from V into the reals, then (dF/dx)(x) is actually a dual vector, and so the chain rule looks like

d(F(Ax))/dx = (dF/dx)(x) A.
The gradient vector grad F(x) is the vector in V which identifies dF/dx(x); this vector transforms like
gradx F(Ax) = AT gradx F(x).
Fortunately for me, I learned about this distinction early in my career; this knowledge has doubtless saved me from hours of wondering whether I needed to transpose a matrix or not, hours which I could instead spend wondering whether I'd made a sign error somewhere.

What have dual vectors to do with programming? I think that the notion of a dual vector sometimes confuses people because they are unused to the idea of manipulating functions as interesting mathematical objects in their own right. Even for those who find the idea of a dual vector natural often are confused by the idea of contravariant and covariant components in a tensor -- though it's really the same idea. Certainly I found it confusing initially! The same source of confusion applies, I think, when people first learn about functional programming. A function can be a valid object of mathematical and programmatic manipulation: a simple enough statement, but oh! the confusion it causes.

In an attempt to reduce the confusion for at least one or two people, I have been thinking about how to describe dual vectors in a functional programming language (or at least in a programming language that allows a functional style). For this exercise I will use Scheme (a LISP dialect) and Lua (one of my favorite scripting languages). I start with functions to compute the dot products of two vectors. In Lua, I would represent a vector as a table: if x is a vector, then x[i] is the ith component of the vector. Using this representation, I can compute the dot product of two vectors with this function:

  -- Dot product of two vectors (Lua version)
  function dot(x,y)
    local sum = 0
    for i,xi in x do
      sum = sum + xi*y[i]
    end
    return sum
  end

In Scheme, I represent vectors as lists (we'll ignore Scheme's native vector type for the moment). To compute the dot product of two vectors, x and y, I use the following function:

  ;; Dot product of two vectors
  (define (dot x y)
    (apply + (map * x y)))

It's worth dwelling a moment on the Scheme implementation. The Scheme code computes dot products in two phases. In the first phase, the map procedure is used to apply the multiplication operation to every pair of entries from x and y. The resulting list of elementwise products is then summed up in order to get the dot product. Note that there was no need to explicitly specify a loop or a recursion: all that work was done by the map and apply operations.

What if we want the linear functional identified by a particular vector? Because both Scheme and Lua allow functions to be treated as first-class objects, this mapping can be done very naturally. In Lua, we have

  -- Produce a dual vector (a linear functional)
  function dualv(x)
    return function(y) 
      return dot(x,y) 
    end
  end

while in Scheme,

  ;; Produce a dual vector (a linear functional)
  (define (dual-v x)
    (lambda (y) (dot x y)))

The idea is the same in either case: the function dualv returns another function. The returned function is never given a name; in Scheme, such an anonymous function is called a lambda (after Church's lambda calculus, the logical system upon which LISP and its successors were directly based). In Lua, a function is usually just called a function, without the fancy-sounding adjective.

Now, what if we want to go one more step, and define double-dual vectors? Just as a dual vector is a functional on the original space V, a double-dual vector is a functional on the dual space V*. In a finite dimensional setting (and in many infinite-dimensional cases), all the elements of the double-dual are identified with elements of the original space V. A double-dual vector g identified with a vector v acts on dual vectors by evaluation at v:

g(f) = f(x)
Just as we did with dual vectors, we can define Lua and Scheme functions which implement this canonical map between elements of the original space and elements of the double-dual space:

  -- Produce a function x(f) for "evaluate f at x"
  function evalat(x)
    return function(f) 
      return f(x) 
    end
  end

  ;; Produce a function x(f) for "evaluate f at x"
  (define (eval-at x)
    (lambda (f) (f x)))

Now we've written routines to go from vectors to dual vectors or double dual vectors. Let's see a concrete use for these routines. Suppose we would like to produce a linear operator from its matrix representation. That is, we have a list of rows, and each row is a list; these rows represent linear functionals that map an input vector to some component of an output vector. We can produce a function that multiplies an input argument by the matrix using the following very simple Scheme functions:

  ;; Produce a function to evaluate a list of functions
  (define (f-list fs)
    (lambda (x) (map (eval-at x) fs)))

  ;; Produce an operator
  (define (make-linear m)
    (f-list (map dual-v m)))

The make-matrix command first converts each row in the matrix to a corresponding dual vector, then (using f-list) produces a function that applies each dual vector in turn to an input argument, returning the result in another list. Note how f-list takes advantage of the eval-at function. The map routine applies one function to many arguments; by thinking of x not as a vector, but as a function that acts on functions (a double-dual vector when all is linear), we can adapt map to apply many functions to a single argument, which is what we want.

What about a Lua implementation? Idiomatic code in different languages will naturally follow different patterns; in this case, the lack of a map function in Lua (and the naturalness of writing a for loop), suggested the following routine to generate a linear function:

  -- Produce an operator
  function makelinear(m)
    return function(x)
      local mx = {}
      for i,mi in m do
        mx[i] = dot(mi,x) 
      end
      return mx
    end
  end
  • Currently drinking: Rooibos