This is part two in a planned five-part sequence on some interesting applications of linear algebra to tomography for computer networks, programs, and a yet-to-be-decided physical example. Today: understanding link properties from end-to-end measurements.

I'm going to start with the same setup that we had yesterday. We have n computers attached to the Internet, and they need to communicate in order to cooperate on something. We can measure the log loss rates on any of the n(n-1) end-to-end network paths; and with a mild assumption about independence of failures, we found that the log loss rate of a path is just the sum of the log loss rates for all the links on the paths. We summarized the relation between the log loss rates for all the paths and for all the paths with the compact equation:

- G_{ij} is one if path i uses link j, zero otherwise;
- b_i is the log loss rate for path i; and
- x_j is the log loss rate for link j.

The matrix G is typically very *rank-deficient*: both rows
and columns in the matrix are *linearly dependent* upon each
other. In practice the rank k of G seems to grow sort of like
O(n \log n), though we still only have a fuzzy understanding of
the behavior of k as a function of n and of the overall network
topology. Yesterday, I described how we could use the low rank of
G to infer loss rates on all n(n-1) end-to-end network paths by
measurements of only k end-to-end paths. Today, I'll describe a
different problem: what can we tell about the loss rates of the
individual links? This information can be useful if, for example,
we would like to diagnose where there is a network problem, so that
we can route around it, fix it, or complain loudly about it to
someone who is contractually obligated to fix it for us.

The answer, as it turns out, is that we can't tell a whole lot about
the individual links just using the equations I've written so far.
The problem can be illustrated by a simple example: if I know c + d
= 5, can I find c? No, c could be anything! If you give me
d, I can find c, and vice-versa; but without more information, I
cannot say anything more about c or d. Similarly, I know a lot
about *sums* of link properties, but the linear system that I
wrote down generally provides too little information for me to find
the *individual* link properties. The rank deficiency that
was so useful for yesterday's monitoring application is now getting
in our way.

How can we get the extra information we need to make progress?
Well, there is a natural set of *inequality* constraints
which I have so far ignored. Remember from yesterday that I
actually described the link loss rates using negative log
probabilities:

*non-negative*. More generally, I have x \geq 0 and b \geq 0, where the inequalities should be interpreted component-by-component. This tells us a lot! In the example where c+d = 5, if we know that c and d are non-negative, then we can immediately say that 0 \leq c,d \leq 5. We still can't get exact values for c and d without more information, but we can get upper and lower bounds, which is a major improvement.

So while the equations Gx = b don't generally provide sufficient information to determine the link loss x_i, I can get a possibly-useful bounds on x_i because I know that x \geq 0. For example, I know that the loss rate on any network path can be no smaller than the maximum loss rate over all links used by that path; or, put the other way around, I know that the loss rate of any link is bounded above by the loss rates of all paths that use the link. So if path i uses link j (G_{ij} = 1), then I know x_j \leq b_j. But here's a remarkable thing about the Internet: most of the time, the packets you send make it through intact! Most links and paths in the network lose few packets; only a few links, and the paths that go through them, suffer from excessive packet loss. So even with this simple bound, we can show that many links in the network have very low loss rates.

The tightest possible upper bound for x_i based on what I've discussed so far is x_i \leq x_i^{\mathrm{max}}, where

*linear programming problem*. The theory of linear programming is pretty well-developed, so we might hope that there's a cheap algorithm to compute great bounds for every component of i, which is about all we can hope for so far. Unfortunately, these linear programs can get unwieldy unless we use heuristics or special structures in the system.

I like linear problems with special structure. They give me a sense of purpose.

This is where the joint work with Yan currently stands. The paper
which we're finishing now is essentially about how to compute good,
practical bounds on link loss rates in order to find where there are
problems in the network. I think this approach has a lot of
promise, though a lot of fine-tuning remains, both in the
mathematical theory and in the implementation. There are
alternatives, but those alternatives usually involve either
measurements which are *not* end-to-end (and are hard to make
reliably); or statistical models in which errors in the model
assumptions can affect prediction accuracy.

**Next time:**What has tomography to do with software testing?**Currently drinking:**Black tea with vanilla