Here’s another bite-sized stream algorithm for your delectation. This time we want to simulate a random walk from a given node u in a graph whose edges arrive as an arbitrarily-ordered stream. I’ll allow you multiple passes and semi-streaming space, i.e., \tilde{O}(n) space where n is the number of nodes in the graph. You need to return the final vertex of a length t=o(n) random walk.

This is trivial if you take t passes: in each pass pick a random neighbor of the node picked in the last pass. Can you do it in fewer passes?

Well, here’s an algorithm from [Das Sarma, Gollapudi, Panigrahy] that simulates a random walk of length t in \tilde{O}(n) space while only taking O(\sqrt{t}) passes. As in the trivial algorithm, we build up the random walk sequentially. But rather than making a single hop of progress in each pass, we’ll construct the random walk by stitching together shorter random walks.

  1. We first compute short random walks from each node. Using the trivial algorithm, do a length \sqrt{t} walk from each node v and let T[v] be the end point.
  2. We can’t reuse short random walks (otherwise the steps in the random walk won’t be independent) so let S be the set of nodes from which we’re already taken a short random walk. To start, let S\leftarrow \{u\} and v\leftarrow T[u], \ell\leftarrow\sqrt{t} where v is the vertex that is reached by the random walk constructed so far and \ell is the length of this random walk.
  3. While \ell < t-\sqrt{t}
    1. If v\not \in S then set v\leftarrow T[v], \ell \leftarrow \sqrt{t}+\ell, S\leftarrow S\cup \{v\}
    2. Otherwise, sample \sqrt{t} edges (with replacement) incident on each node in S. Find the maximal path from v such that on the i-th visit to node x, we take the i-th edge that was sampled for node x. The path terminates either when a node in S is visited more than \sqrt{t} times or we reach a node that isn’t in S. Reset v to be the final node of this path and increase \ell by the length of the path. (If we complete the length t random walk during this process we may terminate at this point and return the current node.)
  4. Perform the remaining O(\sqrt{t}) steps of the walk using the trivial algorithm.

So why does it work? First note that the maximum size of S is O(\sqrt{t}) because |S| is only incremented when \ell increases by at least \sqrt{t} and we know that \ell \leq t. The total space required to store the vertices T is \tilde{O}(n). When we sample \sqrt{t} edges incident on each node in S, this requires \tilde{O}(|S|\sqrt{t})=\tilde{O}(t) space. Hence the total space is \tilde{O}(n). For the number of passes, note that when we need to take a pass to sample edges incident on S, we make O(\sqrt{t}) hops of progress because either we reach a node with an unused short walk or the walk uses \Omega(\sqrt{t}) samples edges. Hence, including the O(\sqrt{t}) passes used at the start and end of the algorithm, the total number of passes is O(\sqrt{t}).

Das Sarma et al. also present a trade-off result that reduces the space to \tilde{O}(n\alpha+\sqrt{t/\alpha}) for any \alpha\in (0,1] at the expense of increasing the number of passes to \tilde{O}(\sqrt{t/\alpha}). They then use this for estimating the PageRank vector of the graph.