The sun is setting on La Rocca di Bertinoro. The pasta has run dry and the wifi signal is fading. Theorists are scurrying towards taxis and preparing themselves for their overnight flights. Sublinear Algorithms 2011 is all over. But fear not! There are still three more talks to report… First Robi Krauthgamer discussed “Polylogarithmic Approximation for Edit Distance” [Andoni, Krauthgamer, Onak]. Given two length $n$ sequences, the edit distance is the minimum number of character operations (inserts, deletions, substitutions) that are sufficient to transform the first string into the latter. Masek and Paterson showed that the edit distance could be computed exactly in $O(n^2/\log^2 n)$ time. Recent work has focused on approximation in near-linear time. Robi presented an algorithm that achieves a $(\log n)^{O(\epsilon^{-1})}$ approximation in $n^{1+\epsilon}$ time. This was a significant improvement over the previous best of $2^{\tilde{O}(\sqrt{\log n})}$ approximation. [Andoni, Krauthgamer, Onak] is also the paper that sowed the seeds for the precision-sampling technique that Alex presented on Wednesday (one of my favorite talks from the workshop). Next, Artur Czumaj gave a talk entitled “Planar Graphs: Random Walks and Bipartiteness Testing” [Czumaj, Monemizadeh, Onak, Sohler]. This is one of the few results for property testing in general graphs, i.e., not dense graphs or graphs with constant degree. In the dense graph model you can query entries of the adjacency matrix and need to accept input graphs with the property of interest and reject graphs if $\epsilon n^2$ entries of the adjacency matrix would need to be changed in order to have a graph that has the property. After about ten years of research we have a very good understanding of what is possible in this model. Attention has since focussed on the bounded-degree adjacency list model where every node has degree at most $d$ and we need to reject graphs if $\epsilon dn$ changes would be necessary to satisfy the property. In both models, testing bipartiteness has been an important problem: in the dense graph model the complexity is constant (we’re treating $\epsilon$ as constant) and in the bounded-degree model the complexity is $\Theta(\textrm{polylog}(n) \sqrt{n}).$ In the general graph model, there is no bound on the max-degree and we may query the graph by a) asking for the $i$-th neighbor of a specific node or b) asking for the degree of a specific node. In practice it seems like asking for a random neighbor of a specific node is sufficiently powerful. Artur showed that it was possible to test bipartiteness in planar graphs with constant queries. The very last talk was given Ronitt Rubinfeld on “Testing Collections of Properties” [Levi, Ron, Rubinfeld]. Previous work on property testing of distributions considered single distributions or pairs of distributions: we’d take iid samples from the distributions and try to determine if, e.g., the distributions where identical or far from identical (i.e., the $\ell_1$ distance between the distributions is at least $\epsilon$.). Ronitt considered a more general model in which there are $m$ distributions $D_1, \ldots, D_m$ over $[n]$ and a (known or unknown) distribution $\nu$ over $[m]$. On querying the distributions we get $(i,x)$ where $i\sim \nu$ and $x \sim D_i$. The goals included determining whether all $D_i$ are identical or whether they can be clustered into $k$ clusters such that within a cluster, all distributions are close. Results included a tight bound if $n>m$ of $\tilde{\Theta}(n^{2/3}m^{1/3})$ for the question of testing if all distributions where identical. Using the observation that $(i,x)$ are independent iff all $D_i$‘s are equal, this also gives a tight lower bound for independence testing, resolving an open question from previous work.

BONUS! More Photos… Here are a few more photos from the workshop. By day we’d listen to theory talks in the Fresco room… By night we’d talk about theory over dinner… On our afternoon off, we admired cathedrals and talked more theory. 