You are currently browsing the monthly archive for May 2011.

(Piotr reports on the last two talks of Day 4.)

After lunch, Christian Sohler served new results on property testing in hyperfinite graphs. A family of graphs is $(k,\epsilon)$-hyperfinite if in any graph we can remove $\epsilon n$ edges to obtain connected components of size at most k. For example, by planar separator theorem, planar graphs qualify for this distinction. In the talk, Christian focused on graphs of bounded degree and k.

Christian presented a number of results, including the following (and surprising) one: every property is testable on hyperfinite graphs. The basic idea is: for any graph G in the family, if we remove some $\epsilon n$ edges from G, then the resulting graph G’ can be fully described as a distribution over graphs of size at most k and one can approximate this distribution by random sampling. One interesting implication of his work is that computing statistics of “graph motifs” in large networks (a popular tool in network analysis) is justified, since such statistics identify the graph itself.

The final talk of the day was by Oded Goldreich, who talked about sub-linear algorithms in graphs of bounded degree. He considered a model where one can ask for a specific (i.e., the i-th) neighbor of a given node. He considered the problem of *finding* (not just testing existence of) cycles and trees in a given graph. The first of them was a (roughly) $\sqrt{N}$-time algorithm for finding cycles in N-vertex graphs that are eps-far from being cycle free. One can also generalize this result to finding cycles of length at least k, for a fixed k. He also showed that one cannot do (much) better than $\sqrt{N}$ is optimal. In contrast, he also gave constant time algorithms for finding trees of size at least k (for constant k).

The algorithms are based on the following observation linking one-sided property testing and finding structures with that property. Specifically, if such tester rejects a graph, it must have some “evidence” that the property does not hold. E.g., if we test for bipartiteness, then the violating structure is a non-bipartite graph. This leads to an algorithm that finds a desired structure.

To obtain an algorithm for finding cycles (or testing cycle freeness), he used a reduction to bipartite-ness testing. The idea is simple and very cute: for each each edge one replaces it by a 2-path with prob. 1/2, and keeps it intact otherwise. Clearly, any cycle-free graph is mapped into a cycle-free (and therefore bipartite) graph. The key observation is that a graph that is far from being cycle free is mapped into a graph that is far from being bipartite.

(Piotr Indyk reports from Day 4 at Sublinear Algorithms. Includes contributions from Sariel Har Peled.)

The first session of the day was dedicated to distributed algorithms. It was headlined by Roger Wattenhofer, who gave a very nice overview of the area. Distributed algorithms have recently found applications in sub-linear world, e.g., see the talks on Monday. Distributed algorithms communicate along the edges of a graph, and the algorithm complexity is measured in rounds of communication.

The central character in Roger’s talk was the Maximal Independent Set (MIS) problem. Despite many years of research on this problem, some basic questions are still open (e.g., is there a deterministic algorithm with $\textrm{polylog} n$ rounds?). Roger described a randomized algorithm that runs in $\log n$ rounds. The algorithm is local: a decision made at a given node depends only on actions of the nodes in its small neighborhood. He then proceeded to show that any algorithm for MIS requires at least $\log^* n$ rounds. The proof used Ramsey theory, which was a nice coincidence, given that a workshop on this topic was held in parallel. The lower bound in fact holds even for the line graph. Unfortunately (if you are a pessimist), for the line graph, there is a matching upper bound of $\log^* n$. Therefore, to improve the lower bound one needs a richer class of instances. Roger showed a construction of graphs for which one can show a lower bound of $\sqrt{\log n}$ or $\log \Delta$ (for a closely related problem of vertex cover). The graph was highly unusual, in that it was “highly symmetric” and “asymmetric” at the same time.

The next talk, by Pierre Fraigniaud, was on local distributed decisions (i.e., decision problems). An example of such problem is coloring validation: given a coloring, is it legal ? Other examples include: checking whether there is at most one marked node, consensus, etc. In general, one can define the notion of distributed language, which is a decidable collection of configurations. A configuration is accepted iff it is accepted by all nodes. In other words, the configuration is not accepted if someone says it is not accepted. Pierre presented several results on randomized algorithms for decision making, including a strong derandomization result (under certain conditions).

In the next session, Ning Xie talked about Local Computation Algorithms. Here the idea is that you are given the input, and problem at hand generates a large output. For example, the input is a graph and the task at hand is computing a maximal independent set (MIS). Let us assume that we want to output only one specific bit of the output. The question is whether we can do it in sublinear time, and in a way that is consistent, if we decide to read i bits of the output in some arbitrary fashion (i.e., think about the algorithm here is a query data-structure that outputs the desired bit). Since we need to be consistent across different queries, if we are going to use randomness, it needs to be compactly represented. To this end, one uses k-pairwise independent sequence of random bits, which can be specified compactly with a “few” random bits, and one can generate a long sequence implicitly that can “cheat” many algorithms. Secondary, to get a local computation, we need to be able to break up the input quickly into small fragments, such that the fragment that defines the desired output (i.e., it contains the vertex we need to output if it is in the MIS or not). The main idea is now to use Beck’s constructive version of the Lovasz Local Lemma. For example, you run a randomized MIS algorithm (parallel version) for a small number of iterations, and then you run a greedy algorithm on each remaining connected component. The randomized algorithm here chooses a vertex $v$ to the independent set with probability $1/2d$ ($d$ is the degree in the given regular graph), and insert it if there are no collisions. After $O(d \log d)$ iterations, one proves that each remaining connected components is of size $O( \log n)$ with high probability. Thus, we can simulate this algorithm locally given a query vertex, and run on the relevant connected component directly, getting a sublinear running time.

Yuichi Yoshida talked about constant-time approximation algorithms. He started from recalling the result of Raghavendra who showed that if the Unique Games Conjecture is true, then for every constraint satisfaction problem (CSP), the best approximation ratio is attained by a certain simple semidefinite programming and a rounding scheme for it. He then showed that similar results hold for constant- time approximation algorithms in the bounded-degree model. He obtained the results in three steps. First, he showed that for every CSP, one can construct an oracle that serves an access, in constant time, to a nearly optimal solution to a basic LP relaxation of the CSP. Using the oracle, he then showed a constant-time rounding scheme that achieves an approximation ratio coincident with the integrality gap of the basic LP. Finally, he gave a generic conversion from integrality gaps of basic LPs to hardness results.

The last talk before lunch was by Vladimir Braverman. He focused on the following class of streaming problems: given a function G, estimate a statistic that is equal to the sum, over all i, of terms $G(m_i)$, where $m_i$ is the number of times an element $i$ appears in the stream. Alon,Matias and Szegedy asked for which functions G it is possible to approximate this statistic using a single pass over the data and using poly-logarithmic memory. Vladimir showed a precise characterization for all monotonically increasing functions $G$ such that $G(0)=0$. He also introduced the method of recursive sketching that can be used, e.g., for approximating frequency moments.

(It’s Day 3 at Sublinear Algorithms and today Seshadhri is taking on blogging duties. Thanks Sesh!)

How to win Netflix: The survey of the day was “Compressed sensing to matrix completion and beyond” by Benjamin Recht. We started by discussing the basic matrix completion problem. Think of a matrix with the rows being movies and columns being Netflix users. The entries are the respective ratings. Suppose we got few of those ratings. Can we reasonably guess what the remaining entries are (and thereby win $1,000,000)? Formally, we have some $k \times n$ matrix $M$ with some entries that are provided. We assume that the matrix is of low rank, meaning that $M = LR$, where $L$ is $k \times r$ and $R$ is $r \times n$. The total number of free variables is $r(k+n)$, which for small $r$ is much smaller than the trivial bound of $kn$. Ben showed how techniques from compressed sensing could be used to attack this problem. His work gave a strong mathematical basis for certain heuristics that people used to solve such problems. The importance of being coherent: That segued into the next talk, the only board talk of the workshop. Michael Mahoney talked about “Fast Approximations for Matrix Coherence and Statistical Leverage”. The concept of matrix coherence was introduced by compressed sensing folks (and the related notion of statistical leverage was present in earlier work by Michael). Consider a low-rank matrix M with the SVD $M = U \Sigma V^*$ (so $U$ is an $n \times r$ matrix). The coherence is the maximum length of the projection of a standard basis vector onto the subspace spanned by the $r$ column vectors of $U$. Formally, $\mu(U) = \frac{n}{r}\max_{i \leq n}\| {\bf P}e_i\|^2$ where $e_i$‘s are the standard basis vectors and ${\bf P}$ is the projection matrix onto $U$. Michael talked about fast algorithms to compute this quantity, involving many linear algebraic manipulations and the recent fast Johnson-Lindenstrauss algorithms. Big brother is watching you(r net traffic): After that, it was time for more traditional theory CS. Ely Porat talked about the classic problem of pattern matching in the streaming model. We are given a text T that we stream through, and a fixed pattern P of length $m$. The fingerprinting algorithm of Karp-Rabin solves this problem, but requires $O(m)$ space (since it basically maintains a sliding window of that size). Ely’s aim was to get this down to $\textrm{polylog} m$ space, so that it would be appropriate to be blogged about here. He discussed some clever tricks to break this pattern into smaller sub-patterns, and using recursive fingerprinting schemes. One open problem was the get space lower bounds for this rather fundamental problem. He also mentioned the open problem of arranging his sabbatical for next year! Contact him for details. How to find your Dacians: Alexandr Andoni gave a talk on “Precision and Sampling Algorithms”. The title was changed in real time to “Sublinear Algorithms via Precision Sampling” (and hence us attendees will never know what his original talk was about). After giving the motivation of the emperor of ancient Rome attempting to estimate the total number of Dacians in various provinces, Alex described a very cute sampling problem. We want to estimate the values of $S = \sum_{1 \leq i \leq n} a_i$, where $a_i \in [0,1]$. Unfortunately, we can only get an estimate $\widetilde{a_i}$ with a precision $u_i$ (so $|\widetilde{a_i} - a_i| \leq u_i$). Our power is that we can choose the precisions $u_i$ but have to pay a cost of $(1/n)\sum_i 1/u_i$. How do we get an estimate $\widetilde{S}$ satisfying $S-\epsilon < \widetilde{S} < (1+\epsilon)S + O(1)$ for constant $\epsilon$? Alex gave a nifty algorithm that has a cost of $O(\epsilon^{-3}\log n)$. This had applications in approximating the edit distance and also in streaming algorithms for large frequence moments. How to bake your cake and eat it too: Nir Halman talked about “FPTASs for Stochastic Optimization Problems”. He introduced his talk using a baking analogy. First, he told us why the result (or cake) was going to be tasty, and then gave us the ingredients and recipe for the result. The problem was a stochastic version of the classic discrete knapsack problem. We have $T$ objects with the $i$th object having volume $v_i$ and profit $p_i$. The volume of the knapsack is $B$, and we can solve this problem using dynamic programming is $O(BT)$ time. This is, of course, not polynomial, and the aim to get a $(1+\epsilon)$-approximate solution and only pay an overhead of $poly(1/\epsilon)$. The general framework of this result is strong enough to generate FPTASs for various versions of knapsack. One of the important cases Nir discussed was the stochastic version, where the volume $v_i$ comes from some explicitly known distribution. Another interesting case was the non-linear knapsack, where we can choose integer copies of each element. The Arthur Cayley approach to odd-cycles: Asaf Shapira was next, telling us about “Testing Odd-Cycle Freeness of Boolean Functions”. Given a boolean function $f$, an odd cycle a set of domain points $x_1, x_2, \ldots, x_t$ such that $t$ is odd, $f(x_1) = \ldots = f(x_t) = 1$ and $\sum_i x_i = 0$. This is a property that falls into the broad class of linear-invariant properties defined by simple constraints, but uncharacteristically, does not have an bounded constraint size. The main result is that this is testable in $O(1/\epsilon^2)$ queries. Previous testers used general characterizations that gave the standard “tower of $(1/\epsilon)$” query complexity (Arnab talked about this result on the previous day). Asaf gave an excellent talk on relating this problem to testing bipartiteness of the Cayley graph of $f$, which was really the heart of the result. He gave details on how to combine Fourier analysis ideas with a spectral approach, connecting the odd cycles in $f$ to the bipartiteness (or lack thereof) of the Cayley graph. This was done extremely well, so even though he walked us through the technical approach, the main ideas were very clear (a rather tough thing to do, IMHO). Finally, he also explained that there exists a canonical $poly(1/\epsilon)$-query tester for this problem. This tester randomly samples $x_1, \ldots, x_{\log (1/\epsilon)}$ and only queries the subspace spanned by them. It is an interesting open question whether for any linear invariant property that is testable in $q(\epsilon)$ queries, there exists a canonical tester running in $poly(q(\epsilon))$ time. After lunch, it was time for the excursion to the ancient town of Ravenna. The highlights were the mosaics of the Basilica of San Vitale and the final resting place of Dante Aligheri. These involved little to no new theoretical results, and hence the blog post shall end here. (Krzysztof Onak continues with Tuesday’s report for Sublinear Algorithms 2011.) The second part of the day started with a session devoted to the Johnson-Lindenstrauss lemma. Nir Ailon presented different definitions of the problem. In the embedding version, one is given a set of $n$ Euclidean vectors in $\mathbb R^m$ and wants to map them to $\mathbb R^d$, where d is small and the lengths of vectors are preserved up to a factor of $1\pm \epsilon$. It is known that a random projection on $d=O(\epsilon^{-2} \log n)$ dimensions is likely to produce such vectors. The downsides of using a random projection are that both the running time and the required amount of randomness are $\Omega(md)$. Nir has addressed the challenge of reducing these requirements in his papers [Ailon, Chazelle] and [Ailon, Liberty]. His papers employ the Fourier transform and error correcting codes. Nir also discussed a more recently discovered connection between this problem and the restricted isometry property in compressed sensing. This connection resulted in new algorithms for efficiently computing the Johnson-Lindenstrauss transform. Jelani Nelson presented his recent paper with Daniel Kane. The goal is to give a distribution over sparse matrices that provides the properties required in the Johnson Linderstrauss lemma and can be computed efficiently for sparse vectors. Jelani’s work is a followup to the paper of [Dasgupta, Kumar, Sarlos]. Jelani showed two extremely simple methods for constructing such matrices. In the first, one selects a small number of entries in each column and sets them at random to $-1$ and $1$. In the other one, one partitions the rows into a number of buckets corresponding to the number of desired non-zero entries in each column. Then in each column, one chooses one non-zero entry in each bucket. The main advantage over the approach of Dasgupta et al. is that this way collisions between non-zero entries are avoided. One of the main open questions is to both derandomize the Johnson-Lindenstrauss transform and make it run fast for sparse vectors. Jelani also mentioned a recent follow-up by [Braverman, Ostrovsky, Rabani]. You can spend the following coffee break admiring one of the well preserved frescoes: Sofya Raskhodnikova talked about the problems of testing and reconstructing Lipschitz functions [Jha, Raskhodnikova 2011]. A function $f:X \to \mathbb R$ is $c$-Lipschitz if for every $x,y \in X$, it holds $|f(x) - f(y)| \le c \cdot |x-y|$. In the testing problem, one wants to verify that a function is approximately Lipschitz (for some fixed $c$). In the reconstruction problem, one wants to design a “local filter” through which the question to a function are asked. If the distance of $f$ to the property of being Lipschitz is $\Delta$, then the filter provides access to a function $f'$ such that $f'$ is Lipschitz, and the distance between $f$ and $f'$ is not much larger than $\Delta$. Sofya mentioned applications of this research to program analysis and data privacy. In her presentation, she focused on the case when the domain $X$ is a hypercube. She gave both positive and negative results, where she mentioned that the latter were greatly simplified by techniques from a recent paper of [Blais, Brody, Matulef]. C. Seshadhri talked about his recent result on submod… Well, Sesh eventually decided to talk about approximating the length of the longest increasing subsequence [Saks, Seshadhri]. Sesh showed that an additive $\delta n$-approximation to the length of the longest increasing subsequence can be computed in $(1/\delta)^{O(1/\delta)} \cdot (\log n)^c$ time, where $n$ is the length of the sequence, $\delta$ is an arbitrary parameter, and $c$ is a constant. Before his work, the best known $\delta$ for which a sublinear algorithm was known was $1/2$. This result also implies a better approximation algorithm for approximating the distance to monotonicity. Sesh showed a simplified version of his algorithm that runs in $(\log n)^{O(1/\delta)}$ time. The algorithm mimics a dynamic program, recursively searches for good splitters, and applies some boosting to obtain a better approximation than $n/2$. Sesh also complained that working on the paper and making everything work was painful. He even explicitely asked: “Was it really worth it?” Sesh, I think it was. This is a really nice result! At night we experienced earthquakes, with the strongest of magnitude 3.1. It is pretty clear that they were caused by the impression left by today’s talks. A screenshot from www.emsc-csem.org: (Krzysztof Onak takes over the daily blogging for Sublinear Algorithms 2011. The great photos are also by Krzysztof. Be sure to book him for your next wedding, birthday party, or Bar Mitzvah.) The day started with Paul Valiant‘s talk on distribution testing. The talk covered his recent papers on the topic, co-authored by his brother Greg Valiant (paper 1, paper 2). At the beginning, Paul selected a few volunteers from the audience, and made them draw numbers from a pool he prepared. We then discussed what distribution is most likely to produce the frequencies the volunteers selected. In one case the selected numbers were 7, 18, and 7. The most likely distribution to produce these numbers is 7 with probability 2/3, and 18 with probability 1/3. I found it interesting that if the labels are removed, then the most likely distribution to produce the same frequencies is, say, 1 with probability 1/2 and 2 with probability 1/2. Paul focused mostly on estimating the support size and the entropy of the distribution, for which they give optimal algorithms with complexity roughly $\Theta(n / \log n)$. Paul touched upon topics such as approximating distributions by Poisson and Gaussian distributions and finding the most likely distribution to produce the given set of samples when the labels are removed. Paul also asked the question whether the additional computational power helps. Many estimators statisticians use are computationally inexpensive. The running time is often linear. Can we obtain better estimators given the relatively cheap computational power available nowadays? In the next talk, Rocco Servedio presented his work with Costas Daskalakis and Ilias Diakonikolas on learning and testing $k$-modal distributions. He focused on the learning problem, in which the goal is to obtain an algorithm that outputs a distribution with total variation distance at most $\epsilon$ from the sampled distribution. The algorithm has to use few samples and be computationally efficient. The first approach to the problem is to partition the support into intervals of mass approximately $\epsilon/k$ each, and then run an algorithm for learning monotone distributions on each of them. This gives a suboptimal algorithm that can be improved, using a testing algorithm to find the intervals containing maxima and minima of the distribution efficiently. The testing algorithm verifies that the distribution is monotone, provided it is $k$-modal. The general monotonicity question requires $\Omega(\sqrt{n})$ samples, where$n\$ is the support size. The algorithms that Rocco presented use only $\mathrm{poly}(k,\log n,1/\epsilon)$ samples and are not far from the best known lower bound.

During the following coffee break, we admired a part of the castle’s exposition, just outside of the lecture room:

Madhu Sudan discussed his several papers with various coauthors on local testability of affine-invariant properties. Many kinds of property testing can be seen as the study of a specific set of invariants. For instance, the invariants for graphs are permutations of vertices. For Boolean functions, they are often permutations of variables.
In this talk, Madhu looked at functions mapping $K$ to $F$, where both $F$ and $K$ are fields, and the size of $K$ is $|F|^n$. The affine invariance means that if a function $f$ has the tested property, so does the function $g_{a,b}(x) = f(ax + b)$. One of the main inspirations for this work is linearity testing ([Blum, Luby, Rubinfeld]), which has been very useful and resulted in surprising testers that use only a constant number of queries. One of the goals is to discover other such interesting properties that may be just as useful. Among other things, Madhu discussed the important subclass of affine-invariant properties that have “single orbit characterizations”. One can show that these properties are locally testable with one sided error. Madhu showed a sequence of classes, in which this one was innermost. Further research goals include understanding the testability of superclasses of properties in this sequence.

Arnab Bhattacharyya touched upon a related topic. He focused on Boolean functions on the hypercube. For dense graphs, it has been shown that the properties testable in constant-time are exactly hereditary properties (or “almost hereditary properties”). Hereditary properties are the properties closed under vertex removal. A similar question arises in the study of Boolean functions. A subspace hereditary property is property that still holds if the function is restricted to a subspace. The main conjecture that this line of research tries to resolve is: “All subspace-hereditary linear-invariant properties are testable.” Arnab presented a proof that a large subclass of these properties is testable.

Tali Kaufman talked about her recent research with Irit Dinur. They investigate the relation between locally testable codes and expander graphs. [Sipser, Spielman] showed that expanders can be used to obtain very good correcting codes. A natural question that arises in this context is what graphs lead to good codes. It turns out that using expanders with sufficiently good expansion one can obtain locally testable codes, provided the expansion is not too high, in which case the code loses the property called robustness (see [Ben-Sasson, Harsha, Raskhodnikova]). So what can we say about graphs that lead to locally testable codes? Tali showed that the constraint graph of such codes is a small set expander, that is, all sets of sublinear size expand well.

(Jelani Nelson continues his report from Sublinear Algorithms 2011.)

After lunch, David Woodruff gave a survey talk on norm estimation. A vector $x$ receives coordinate-wise updates, and we want to estimate $\|x\|_p$ for some $p>0$. David started off with some applications (one story: some hedge fund in the 90s needed to be bailed out because they underestimated $\|x\|_4$). There’s been a lot of progress in this area, and David gave a very high-level overview of everything: $p\le 2$ needs just logarithmic space, whereas $p>2$ needs polynomial space. There are also mixed norms, Schatten norms, sum-norms, and matrix norms. One interesting thing is the prevalence of the [Indyk, Woodruff] subsampling/heavy-hitters technique. In fact, David is curious to know just how universal the approach is. Some work in [Braverman, Ostrovsky] begins to shed light on this question.

OPEN: Other models (sliding window, time decay, out-of-order streams, read-write streams, annotations, distributed functional monitoring, compressed sensing). Also, what about streaming for EMD? There’s some work by [Andoni, Do Ba, Indyk, Woodruff], but there’s still more to be done.

Next was Amit Chakrabarti on the Gap Hamming problem. The problem: Alice has $x$ in $\{0,1\}^n$, and Bob has $y$ in $\{0,1\}^n$. They must decide whether the Hamming distance between $x$ and $y$ is bigger than $n/2 + \sqrt{n}$, or less than $n/2 - \sqrt{n}$ (they can output anything otherwise). Amit and Oded Regev managed to close this problem by showing that even arbitrary round communication protocols need $\Omega(n)$ communication. This was a big open question, and it implies that many lower bounds we had for one-pass streaming algorithms now hold for an arbitrary number of passes. The technique involved a slight twist on the corrpution method and moving to Gaussian space. Some simplified proofs have recently been found [Vidick], [Sherstov].

OPEN: Is Gap Hamming hard even when $x,y$ are uniform? Also, can we get $\Omega(\epsilon^{-2}\log(1/\delta))$ lower bounds for all these streaming problems in an arbitrary number of passes? (Amit’s talk was on constant error probability).

The last talk of the day was by Graham Cormode on mergeable summaries, joint work with Pankaj Agarwal, Zengfeng Huang, Jeff Philips, Zheiwei Wei, and Ke Yi. We would like to sketch data so that sketches are mergeable in a, perhaps even arbitrary, merge tree. Though looking at restrictions can sometimes still be interesting: a tree which is a path (streaming), one where only equal sized sketches can be merged, and depth-1 trees. Graham started off by showing that the classical Misra-Gries frequent items summary has the merge property, then moved to quantiles. One neat thing in quantiles is the sketch size was $O(\epsilon^{-1}\cdot \sqrt{\log(1/\delta)})$, exactly the square root of what you would expect from the Chernoff bound, when we only have to merge equal-sized sketches. There was some neat trickery to get arbitrary-sized merges with space independent of $n$.

OPEN: Weight-based sampling, more complex functions (cascaded aggregates?), lower bounds for mergeable summaries, implementation studies (e.g. Hadoop).

The day ended with a fairly delicious dinner, including potatoes and roast pork.

(Jelani Nelson reports from Sublinear Algorithms 2011.)

It’s the first day of talks at the 2011 Bertinoro workshop on sublinear algorithms (titles and abstracts), about one hour’s drive southeast of Bologna, Italy. The day started off with some introductions. Everyone stated their name, affiliation, and often some of their research interests, while Amit Chakrabarti and Graham Cormode spiced up the routine by introducing each other.

Dana Ron gave the first talk: a survey talk on sublinear time algorithms for approximating graph parameters. She talked about estimating average degree, min-weight MST, vertex cover/max matching, and distance to various graph properties (like $k$-connectivity), and she considered various query models. The [Feige] algorithm for average degree estimation algorithm vaguely reminded me of the [Indyk, Woodruff] algorithm for estimating frequency moments: break up vertices (or coordinates) into groups based on how much they contribute to the statistic at hand, then estimate sizes of groups that “matter” (i.e., are big enough). They both even use sampling and argue that groups that matter survive the sampling and can be noticed. Feige gets a $2+\epsilon$-approximation using degree queries. [Goldreich, Ron] gets a $(1+\epsilon)$-approximation using neighbor queries. Oddly enough, I’ve seen this Feige paper before, but for a totally different reason. There’s a disturbingly simple to state, but still open problem left in his paper which I once failed to solve with a friend one evening:

OPEN: If $X_1,...,X_n$ are independent nonnegative random variables each with expectation $1$, then what’s the largest constant $c$ so that $Pr[\sum_i X_i < n+1] \ge c$? Feige proves $c = 1/13$ works, but he conjectures the right answer is $1/e$.

The next talk was by Krzysztof Onak, who focused more on vertex cover (VC) and max-matching. He focused on getting a $(2, \epsilon n)$-approximation (i.e., output an answer between $x-\epsilon n$ and $2x + \epsilon n$, where $x$ is the true answer). We assume queries ask for the $i$th neighbor of a vertex. [Parnas, Ron] got $d^{O(d)\cdot \mathrm{poly}(1/\epsilon)}$ queries. [Nguyen, Onak] improved this to $2^{O(d)}/\epsilon^2$, followed by $d^4/\epsilon^2$ [Yoshida, Yamamoto, Ito]. [Onak, Ron, Rosen, Rubinfeld] most recently got $O(d/\epsilon^3)$, which is the optimal dependence on $d$. Krzysztof tells me the techniques can also give $O(d^2/\epsilon^2)$. The idea for say, VC, is to first reduce to an oracle which can tell you whether any vertex $v$ is inside a particular VC which is at most two times worse than the optimal one. With such an oracle, you can just sample and use the Hoeffding bound. The trick is then to implement the oracle, and various authors starting from Parnas-Ron designed have been designing and implementing slicker and slicker oracles.

OPEN: Is there a good $(1,\epsilon n)$-approximation for maximum matching? Right now the best bound is $d^{O(1/\epsilon^2)}$ queries, but what about $\mathrm{poly}(d/\epsilon)$? How about $\mathrm{poly}(1/\epsilon)$ for planar graphs?

Justin Romberg followed up with a survey on compressed sensing. The idea is to take few linear measurements, e.g., $O(s\log(n/s))$, of an $n$-dimensional vector $x$ such that the best $s$-term approximation of $x$ can be nearly recovered from the measurements. One of the defining papers of the field, by Candes, Romberg, and Tao, shows that such a thing is possible via “$\ell_1$ minimization” as long as the measurement matrix satisfies a certain restricted isometry property (“RIP”), i.e. it nearly preserves Euclidean norms of all sparse vectors. How does one get such matrices? [Baraniuk, Davenport, DeVore, Wakin] show that sampling from a “Johnson-Lindenstrauss” distribution gives RIP with high probability. And in fact, there’s a really neat paper by [Krahmer, Ward], following work by [Ailon, Liberty], that RIP matrices also give JL distributions so that there’s a near-equivalence; I’d even go as far as to say that [Krahmer, Ward] is my favorite paper of 2010. So, random Gaussian entries work, or even the Fast JL transform of [Ailon, Chazelle]. It also works to randomly subsample rows of the Fourier matrix, or take a random partial Toeplitz matrix, but then you need $\mathrm{polylog}(n)$ measurements. Actually, the Toeplitz construction is really neat in that it’s “universal”: you get small error as long as the $x$ is (near-)sparse in some basis, which you don’t need to know in advance.

OPEN: How do you check that a matrix satisfies RIP? Can we do away with RIP and get a computable guarantee for sparse recovery?

Next was Piotr Indyk, who gave a talk on adaptive compressed sensing (joint work with Eric Price and David Woodruff). The question is: if we can look at the output of previous measurements before deciding what to measure next, can we do better? It turns out that you can! Whereas [Do Ba, Indyk, Price, Woodruff] and [Foucart, Pajor, Rauhut, Ullrich] showed that $\Omega(s\log(n/s))$ is a lower bound in the non-adaptive setting, in the adaptive case it turns out $O(s\log\log(n/s))$ measurements is possible. In fact, only $O(\log^{\#} s\log\log(n/s))$ rounds of adaptiveness are required, where the # is yet-to-be-determined ($\log^*$ is at least doable for now). The algorithm was nifty in that, as in other linear sketches, you do some hashing and random signs, but at some point you actually take what’s in some counters and round to the nearest integer. Kind of funky.

OPEN: Lower bounds? What if there’s noise after the measurements? And, deterministic schemes?

Off to lunch…