(Jelani Nelson continues his report from Sublinear Algorithms 2011.)

After lunch, David Woodruff gave a survey talk on norm estimation. A vector x receives coordinate-wise updates, and we want to estimate \|x\|_p for some p>0. David started off with some applications (one story: some hedge fund in the 90s needed to be bailed out because they underestimated \|x\|_4). There’s been a lot of progress in this area, and David gave a very high-level overview of everything: p\le 2 needs just logarithmic space, whereas p>2 needs polynomial space. There are also mixed norms, Schatten norms, sum-norms, and matrix norms. One interesting thing is the prevalence of the [Indyk, Woodruff] subsampling/heavy-hitters technique. In fact, David is curious to know just how universal the approach is. Some work in [Braverman, Ostrovsky] begins to shed light on this question.

OPEN: Other models (sliding window, time decay, out-of-order streams, read-write streams, annotations, distributed functional monitoring, compressed sensing). Also, what about streaming for EMD? There’s some work by [Andoni, Do Ba, Indyk, Woodruff], but there’s still more to be done.

Next was Amit Chakrabarti on the Gap Hamming problem. The problem: Alice has x in \{0,1\}^n, and Bob has y in \{0,1\}^n. They must decide whether the Hamming distance between x and y is bigger than n/2 + \sqrt{n}, or less than n/2 - \sqrt{n} (they can output anything otherwise). Amit and Oded Regev managed to close this problem by showing that even arbitrary round communication protocols need \Omega(n) communication. This was a big open question, and it implies that many lower bounds we had for one-pass streaming algorithms now hold for an arbitrary number of passes. The technique involved a slight twist on the corrpution method and moving to Gaussian space. Some simplified proofs have recently been found [Vidick], [Sherstov].

OPEN: Is Gap Hamming hard even when x,y are uniform? Also, can we get \Omega(\epsilon^{-2}\log(1/\delta)) lower bounds for all these streaming problems in an arbitrary number of passes? (Amit’s talk was on constant error probability).

The last talk of the day was by Graham Cormode on mergeable summaries, joint work with Pankaj Agarwal, Zengfeng Huang, Jeff Philips, Zheiwei Wei, and Ke Yi. We would like to sketch data so that sketches are mergeable in a, perhaps even arbitrary, merge tree. Though looking at restrictions can sometimes still be interesting: a tree which is a path (streaming), one where only equal sized sketches can be merged, and depth-1 trees. Graham started off by showing that the classical Misra-Gries frequent items summary has the merge property, then moved to quantiles. One neat thing in quantiles is the sketch size was O(\epsilon^{-1}\cdot \sqrt{\log(1/\delta)}), exactly the square root of what you would expect from the Chernoff bound, when we only have to merge equal-sized sketches. There was some neat trickery to get arbitrary-sized merges with space independent of n.

OPEN: Weight-based sampling, more complex functions (cascaded aggregates?), lower bounds for mergeable summaries, implementation studies (e.g. Hadoop).

The day ended with a fairly delicious dinner, including potatoes and roast pork.

Advertisements