(Jelani Nelson reports from Sublinear Algorithms 2011.)

It’s the first day of talks at the 2011 Bertinoro workshop on sublinear algorithms (titles and abstracts), about one hour’s drive southeast of Bologna, Italy. The day started off with some introductions. Everyone stated their name, affiliation, and often some of their research interests, while Amit Chakrabarti and Graham Cormode spiced up the routine by introducing each other.

Dana Ron gave the first talk: a survey talk on sublinear time algorithms for approximating graph parameters. She talked about estimating average degree, min-weight MST, vertex cover/max matching, and distance to various graph properties (like k-connectivity), and she considered various query models. The [Feige] algorithm for average degree estimation algorithm vaguely reminded me of the [Indyk, Woodruff] algorithm for estimating frequency moments: break up vertices (or coordinates) into groups based on how much they contribute to the statistic at hand, then estimate sizes of groups that “matter” (i.e., are big enough). They both even use sampling and argue that groups that matter survive the sampling and can be noticed. Feige gets a 2+\epsilon-approximation using degree queries. [Goldreich, Ron] gets a (1+\epsilon)-approximation using neighbor queries. Oddly enough, I’ve seen this Feige paper before, but for a totally different reason. There’s a disturbingly simple to state, but still open problem left in his paper which I once failed to solve with a friend one evening:

OPEN: If X_1,...,X_n are independent nonnegative random variables each with expectation 1, then what’s the largest constant c so that Pr[\sum_i X_i < n+1] \ge c? Feige proves c = 1/13 works, but he conjectures the right answer is 1/e.

The next talk was by Krzysztof Onak, who focused more on vertex cover (VC) and max-matching. He focused on getting a (2, \epsilon n)-approximation (i.e., output an answer between x-\epsilon n and 2x + \epsilon n, where x is the true answer). We assume queries ask for the ith neighbor of a vertex. [Parnas, Ron] got d^{O(d)\cdot \mathrm{poly}(1/\epsilon)} queries. [Nguyen, Onak] improved this to 2^{O(d)}/\epsilon^2, followed by d^4/\epsilon^2 [Yoshida, Yamamoto, Ito]. [Onak, Ron, Rosen, Rubinfeld] most recently got O(d/\epsilon^3), which is the optimal dependence on d. Krzysztof tells me the techniques can also give O(d^2/\epsilon^2). The idea for say, VC, is to first reduce to an oracle which can tell you whether any vertex v is inside a particular VC which is at most two times worse than the optimal one. With such an oracle, you can just sample and use the Hoeffding bound. The trick is then to implement the oracle, and various authors starting from Parnas-Ron designed have been designing and implementing slicker and slicker oracles.

OPEN: Is there a good (1,\epsilon n)-approximation for maximum matching? Right now the best bound is d^{O(1/\epsilon^2)} queries, but what about \mathrm{poly}(d/\epsilon)? How about \mathrm{poly}(1/\epsilon) for planar graphs?

Justin Romberg followed up with a survey on compressed sensing. The idea is to take few linear measurements, e.g., O(s\log(n/s)), of an n-dimensional vector x such that the best s-term approximation of x can be nearly recovered from the measurements. One of the defining papers of the field, by Candes, Romberg, and Tao, shows that such a thing is possible via “\ell_1 minimization” as long as the measurement matrix satisfies a certain restricted isometry property (“RIP”), i.e. it nearly preserves Euclidean norms of all sparse vectors. How does one get such matrices? [Baraniuk, Davenport, DeVore, Wakin] show that sampling from a “Johnson-Lindenstrauss” distribution gives RIP with high probability. And in fact, there’s a really neat paper by [Krahmer, Ward], following work by [Ailon, Liberty], that RIP matrices also give JL distributions so that there’s a near-equivalence; I’d even go as far as to say that [Krahmer, Ward] is my favorite paper of 2010. So, random Gaussian entries work, or even the Fast JL transform of [Ailon, Chazelle]. It also works to randomly subsample rows of the Fourier matrix, or take a random partial Toeplitz matrix, but then you need \mathrm{polylog}(n) measurements. Actually, the Toeplitz construction is really neat in that it’s “universal”: you get small error as long as the x is (near-)sparse in some basis, which you don’t need to know in advance.

OPEN: How do you check that a matrix satisfies RIP? Can we do away with RIP and get a computable guarantee for sparse recovery?

Next was Piotr Indyk, who gave a talk on adaptive compressed sensing (joint work with Eric Price and David Woodruff). The question is: if we can look at the output of previous measurements before deciding what to measure next, can we do better? It turns out that you can! Whereas [Do Ba, Indyk, Price, Woodruff] and [Foucart, Pajor, Rauhut, Ullrich] showed that \Omega(s\log(n/s)) is a lower bound in the non-adaptive setting, in the adaptive case it turns out O(s\log\log(n/s)) measurements is possible. In fact, only O(\log^{\#} s\log\log(n/s)) rounds of adaptiveness are required, where the # is yet-to-be-determined (\log^* is at least doable for now). The algorithm was nifty in that, as in other linear sketches, you do some hashing and random signs, but at some point you actually take what’s in some counters and round to the nearest integer. Kind of funky.

OPEN: Lower bounds? What if there’s noise after the measurements? And, deterministic schemes?

Off to lunch…