(Krzysztof Onak takes over the daily blogging for Sublinear Algorithms 2011. The great photos are also by Krzysztof. Be sure to book him for your next wedding, birthday party, or Bar Mitzvah.)

The day started with Paul Valiant‘s talk on distribution testing. The talk covered his recent papers on the topic, co-authored by his brother Greg Valiant (paper 1, paper 2). At the beginning, Paul selected a few volunteers from the audience, and made them draw numbers from a pool he prepared. We then discussed what distribution is most likely to produce the frequencies the volunteers selected. In one case the selected numbers were 7, 18, and 7. The most likely distribution to produce these numbers is 7 with probability 2/3, and 18 with probability 1/3. I found it interesting that if the labels are removed, then the most likely distribution to produce the same frequencies is, say, 1 with probability 1/2 and 2 with probability 1/2. Paul focused mostly on estimating the support size and the entropy of the distribution, for which they give optimal algorithms with complexity roughly $\Theta(n / \log n)$. Paul touched upon topics such as approximating distributions by Poisson and Gaussian distributions and finding the most likely distribution to produce the given set of samples when the labels are removed. Paul also asked the question whether the additional computational power helps. Many estimators statisticians use are computationally inexpensive. The running time is often linear. Can we obtain better estimators given the relatively cheap computational power available nowadays?

In the next talk, Rocco Servedio presented his work with Costas Daskalakis and Ilias Diakonikolas on learning and testing $k$-modal distributions. He focused on the learning problem, in which the goal is to obtain an algorithm that outputs a distribution with total variation distance at most $\epsilon$ from the sampled distribution. The algorithm has to use few samples and be computationally efficient. The first approach to the problem is to partition the support into intervals of mass approximately $\epsilon/k$ each, and then run an algorithm for learning monotone distributions on each of them. This gives a suboptimal algorithm that can be improved, using a testing algorithm to find the intervals containing maxima and minima of the distribution efficiently. The testing algorithm verifies that the distribution is monotone, provided it is $k$-modal. The general monotonicity question requires $\Omega(\sqrt{n})$ samples, where $n$ is the support size. The algorithms that Rocco presented use only $\mathrm{poly}(k,\log n,1/\epsilon)$ samples and are not far from the best known lower bound.

During the following coffee break, we admired a part of the castle’s exposition, just outside of the lecture room:

Madhu Sudan discussed his several papers with various coauthors on local testability of affine-invariant properties. Many kinds of property testing can be seen as the study of a specific set of invariants. For instance, the invariants for graphs are permutations of vertices. For Boolean functions, they are often permutations of variables.
In this talk, Madhu looked at functions mapping $K$ to $F$, where both $F$ and $K$ are fields, and the size of $K$ is $|F|^n$. The affine invariance means that if a function $f$ has the tested property, so does the function $g_{a,b}(x) = f(ax + b)$. One of the main inspirations for this work is linearity testing ([Blum, Luby, Rubinfeld]), which has been very useful and resulted in surprising testers that use only a constant number of queries. One of the goals is to discover other such interesting properties that may be just as useful. Among other things, Madhu discussed the important subclass of affine-invariant properties that have “single orbit characterizations”. One can show that these properties are locally testable with one sided error. Madhu showed a sequence of classes, in which this one was innermost. Further research goals include understanding the testability of superclasses of properties in this sequence.

Arnab Bhattacharyya touched upon a related topic. He focused on Boolean functions on the hypercube. For dense graphs, it has been shown that the properties testable in constant-time are exactly hereditary properties (or “almost hereditary properties”). Hereditary properties are the properties closed under vertex removal. A similar question arises in the study of Boolean functions. A subspace hereditary property is property that still holds if the function is restricted to a subspace. The main conjecture that this line of research tries to resolve is: “All subspace-hereditary linear-invariant properties are testable.” Arnab presented a proof that a large subclass of these properties is testable.

Tali Kaufman talked about her recent research with Irit Dinur. They investigate the relation between locally testable codes and expander graphs. [Sipser, Spielman] showed that expanders can be used to obtain very good correcting codes. A natural question that arises in this context is what graphs lead to good codes. It turns out that using expanders with sufficiently good expansion one can obtain locally testable codes, provided the expansion is not too high, in which case the code loses the property called robustness (see [Ben-Sasson, Harsha, Raskhodnikova]). So what can we say about graphs that lead to locally testable codes? Tali showed that the constraint graph of such codes is a small set expander, that is, all sets of sublinear size expand well.