The last four BSS posts considered streams of numerical values. While there are still many more bite-sized results to cover for such streams, I thought we’d mix things up a bit and switch to graph streams for the next four posts. We’ll consider a stream of unweighted, undirected edges \langle e_1, \ldots, e_m\rangle and try to estimate properties of the graph G=(V,E) defined by this stream subject to the usual constraints of stream computation (sequential access to the stream and limited memory). For this post, we focus on estimating distances.

The results we’ll discuss revolve around constructing a t-spanner, a sub-graph H=(V,E') of G such that for any u,v\in V

d_G(u,v)\leq d_H(u,v) \leq t d_G(u,v)

where d_X(u,v) is the length of the shortest path between u, v\in V in graph X. Obviously, given such a graph we can approximate any distance up to a factor of t.

So how can we construct a t-spanner in the data-stream model? Here’s a very simple algorithm: Intialize H to \emptyset and on receiving edge e=(u,v), add the edge to H if and only if d_H(u,v)>t.

It’s easy to show that H is indeed a t-spanner because, for each edge e=(u,v) on a shortest path in G, H either includes e or a path between u and v of length at most t. The space taken by the algorithm is just the space required to store H. How large can this be? Well, note that H has girth at least t+2, i.e., it contains no cycles of length t+1 or shorter. It can be shown that such a graph has at most O(n^{1+2/(t+1)}) edges if t is odd (see e.g., [Bollobás] or [Indyk's lecture slides]) and hence we have a stream algorithm using \tilde{O}(n^{1+2/(t+1)}) bits of memory that finds a t approximation for any shortest path distance.

Unfortunately, the time to process each edge in the above algorithm is rather large and this spurred some nice follow-up work including [Feigenbaum et al.], [Baswana], and [Elkin]. It’s now known how to construct a t-spanner in \tilde{O}(n^{1+2/(t+1)}) space and constant worst-case time per edge.

It’s not hard to extend these algorithms to weighted graphs (consider geometrically grouping the edge weights and constructing a spanner for each distinct edge weight) but there’s bad news for directed graphs: even determining if there exists a directed path between two nodes requires \Omega(n^2) bits of memory.

BONUS! Some British English… Back in the UK, we call this

Spanner.

a spanner. Apparently that’s not the usual term in US English and I found this out after giving a talk at SODA back in 2005. The talk went reasonably well but, during the coffee break, numerous people asked why I’d kept on displaying pictures of wrenches throughout my talk. Whoops.

In technical writing, I normally endeavor [sic] to use US English (although I did once engage in some “colouring of paths” in a paper with a Canadian co-author). But I still think “math and stat” sounds funny :)