Data Streaming in Dortmund: Day 1

This week, I’m at the Workshop on Algorithms for Data Streams in Dortmund, Germany. It’s a continuation in spirit of the great Kanpur workshops from 2006 and 2009.

The first day went very well despite the widespread jet lag (if only jet lag from those traveling from the east could cancel out with those traveling from the west.) Sudipto Guha kicked things off with a talk on combinatorial optimization problems in the (multiple-pass) data stream model. There was a nice parallel between Sudipto’s talk and a later talk by David Woodruff and both were representative of a growing number of papers that have used ideas developed in the context of data streams to design more efficient algorithms in the usual RAM model. In the case of Sudipto’s talk, this was a faster algorithm to approximate $b$ -matchings while David’s result was a faster algorithm for least-squares regression.

Other talks included Christiane Lammersen presenting a new result for facility location in data streams; Melanie Schmidt talking about constant-size coresets for $k$ -means and projective clustering; and Dan Feldman discussing the data stream challenges that arise when trying to transform real-time GPS data from your smart-phone into a human-readable diary of your life. I spoke about work on constructing a combinatorial sparsifier for an $n^2$ -dimensional graph via a single random linear projection into roughly $n$ dimensions. Rina Panigrahy wrapped things up with an exploration of different distance measures in social networks, i.e., how to quantify how closely-connected you are to your favorite celebrity. This included proposing a new measure based on the probability that two individuals remained connected if every edge was deleted with some probability. He then related this to electrical resistance and spectral sparsification. He refused to be drawn on which of his co-authors had the closest connection to the Kardashians.