You are currently browsing the monthly archive for March 2010.

Following on from the last post on maximum weighted matchings in the stream model, I wanted to briefly mention a result that appeared in STACS last week. The set-up is the same as before: there’s a stream of undirected weighted edges \langle (e_1,w_{e_i}), \ldots, (e_m,w_{e_m})\rangle where w_{e} is the weight of edge e and we want to find an approximate maximum weighted matching. Last time, we showed a simple algorithm that gave a 5.828 approximation that was “preemptive online”, i.e., at any point in time the algorithm remembered no information other than the feasible matching found so far. The follow-up work achieved a 5.585 approximation but didn’t have this property. This begs numerous questions including if 5-ish is the “right answer” for this problem and whether there is a separation between preemptive-online algorithms and stream algorithms that use roughly the same amount of memory.

Which brings us to [Epstein, Levin, Mestre, Segev]. They showed a lower bound of 4.967 for any deterministic preemptive online algorithm. Furthermore, they show that there exists a deterministic stream algorithm that breaks through this bound and achieves a 4.911 approximation. As a commenter pointed out, the algorithm is nice and simple. Furthermore, it makes a good teaching example of how algorithms don’t necessary appear from thin air, but rather can be developed in a series of less mysterious steps. Here’s what they do in the paper:

  • Design a simple deterministic 8-approximation: Geometrically group edge weights by rounding edge weight to the nearest power of 2 and ignore the “very light” edges. Greedily find a maximal matching among each group of edges. At the end of the stream, find a maximum matching among the stored edges.
  • Randomize the algorithm to get a 4.911-approximation: Instead of grouping to the nearest power of 2, do randomized geometric grouping with the appropriate parameters.
  • Derandomize!

Thought… It might be interesting to investigate which other problems exhibit a separation between the preemptive online model and the stream model. You may need to be flexible with your definition of “preemptive online.”

Some nifty stream results in the accepted list from PODS:

  • Optimal Sampling From Distributed Streams [Cormode, Yi, Zhang, Muthukrishnan]
  • Fast Manhattan Sketches in Data Streams [Nelson, Woodruff]
  • An Optimal Algorithm for the Distinct Elements Problem [Kane, Nelson, Woodruff]

I also like the look of the PODS tutorial on “Information Complexity” that will be presented by T. S. Jayram. This is one of the most important techniques for proving stream and communication lower bounds. Check out the abstract here.

UPDATE: “An Optimal Algorithm for the Distinct Elements Problem” received the PODS Best Paper Award. Congrats!

[FWIW, I also had a paper accepted but it was on differential privacy rather than data streams. Perhaps I should add “secrecy” as a semi-suitable synonym to subjoin to the blog’s “streams, sketches, samples, sensing” tagline. Or would that be silly?]


A research blog about data streams and related topics.

Recently Tweeted