To Bristol, UK, England, for a 2 day workshop of stringy, streamy, energetic and generally algorithmic talks and chat. This is a “Feasibility workshop” aimed at “Bridging the gaps” which I learn is UK-funding speak for “let’s throw a bunch of people together from different areas, and see what they come up with”. So not altogether different from most workshops, except that the funding agency wants some credit if any interesting collaborations emerge.
The morning session was the most streamy: I talked about efforts on streaming interactive proofs, Patrick McSharry gave a stats-y view of streams, and Mark Feblowitz talked streaming systems, from an IBM perspective. I think there’s lots of room to bring these different views closer together. The stats view outlined by Patrick has many familiar notions. A core concept is ensembles of simple classifiers for prediction: these are easy to train and quick to extract answers from, compared to trying to make one monolithic model to rule them all. What I don’t yet get is how to best learn the weights to apply to the different classifiers. Familiar tricks keep the sufficient statistics fast to update: rolling windows and exponential decay. I wonder, how much of this is convenience rather than being “the right thing” to do?
InfosphereStreams is the IBM name for a system that reminds me a lot of early research systems (Stream, Aurora, Borealis): impressive looking operator graphs that transform and reduce streams, spread across multiple machines. A high-level view makes it seem like this is a solved problem, but I feel sure that there are many algorithmic challenges (scheduling, operator placement) lurking under the surface. Moreover, there is a temptation with such systems to achieve higher performance by throwing more hardware at the problem instead of looking for an alternate algorithm — what can we do to identify such options, or at least argue for them?
After lunch, a survey of string matching in streaming data, with hard time constraints. Most intriguing to me was discussion of the Porat and Porat FOCS09 paper, which I have so far found inaccessible: for the most part, since it is not possible to find a copy on the web (even behind IEEE’s paywall), but also because it is reputedly tricky to digest. Rumours swirl at lunch that there is a cleaner, almost “book” level proof in preparation, possibly with improvements. I hope that this shows up soon.
BONUS! A game… The final talk had nothing to do with streams, but everything to do with Flood-It, an addictive flash game which unsurprisingly turns out to be NP hard. A bunch of Bristolians have taken it apart and shown some great analysis — it’s a lot of FUN. Check it out here.