You are currently browsing the category archive for the ‘Uncategorized’ category.

Two of my students, Michael Crouch and Daniel Stubbs, are graduating this year. What does this mean?

**Michael’s looking for a post doc position.**Michael has done some really nice work in the field of data stream algorithms including designing the first graph algorithms for the sliding window model; analyzing space-sample trade-offs in stochastic streams; estimating stream periodicity via linear sketches; and improving the best known results for approximate maximum weighted matching (to appear). Prior to this, he worked in descriptive complexity and built a system for automatically finding reductions between problems. Michael gives great lectures and also has a particular strength when it comes to helping and mentoring more junior students. Which brings me to…**Dan’s applying to grad school.**Dan worked with Michael on the above graph algorithms and separately has a really neat result on sketching the earth-mover distance on graph metrics. Of particular note is that he has just been declared:Winner of the 2014 CRA Outstanding Undergraduate Research Awards!

This award “recognizes undergraduate students in North American colleges and universities who show outstanding research potential in an area of computing research.” Except, in Dan’s case, I’d skip the word “potential”.

**I’m looking for new students!**If you’re applying to grad school and are interested in data stream algorithms, or algorithms in general, please consider submitting your application to UMass Amherst. Details are here and our deadline is December 15th. For more details about my research interests see my website. And check-out the UMass Theory Group.

Email me if you’d like to follow-up on any of the above points.

**New Blog:** Eric Blais, Sourav Chakraborty, and C. Seshadhri have started a new blog on property testing at http://ptreview.sublinear.info/. Also check out Moritz Hardt’s newish blog at http://mrtz.org/blog/.

**This Blog:** Things have been quieter at the polylogblog over the last few months but this will soon be rectified. This semester, I’m visiting the Simons Institute for the Theory of Computing in Berkeley. The program on Theoretical Foundations of Big Data Analysis has been great so far and the second workshop starts tomorrow: tune in live here for everything you’ve always wanted to know about “Succinct Data Representations and Applications”. However, with so much happening, I’ve been rather derelict in my blogging duties. Fortunately, numerous other bloggers have picked up the slack:

- Moritz Hardt: http://mrtz.org/blog/what-should-a-theory-of-big-data-do/
- Sebastian Bubeck: https://blogs.princeton.edu/imabandit/2013/09/02/first-week-of-activity-at-the-simons-institute/
- Muthu Muthukrishnan: http://mysliceofpizza.blogspot.com/2013/09/simons-big-data-boot-camp.html
- Suresh Venkatasubramanian: http://geomblog.blogspot.com/2013/08/on-theory-of-big-data.html and this, and that.
- Gil Kalai: http://gilkalai.wordpress.com/2013/09/09/simonsucberkeley/

The exciting news sweeping the blogosphere (see here and here) is that SPARC 2013 is on its way. Specifically, Atri asked me to post the following:

Efficient and effective transmission, storage, and retrieval of information on a large-scale are among the core technical problems in the modern digital revolution. The massive volume of data necessitates the quest for mathematical and algorithmic methods for efficiently describing, summarizing, synthesizing, and, increasingly more critical, deciding when and how to discard data before storing or transmitting it. Such methods have been developed in two areas: coding theory, and sparse approximation (SA) (and its variants called compressive sensing (CS) and streaming algorithms).

Coding theory and computational complexity are both well established fields that enjoy fruitful interactions with one another. On the other hand, while significant progress on the SA/CS problem has been made, much of that progress is concentrated on the feasibility of the problems, including a number of algorithmic innovations that leverage coding theory techniques, but a systematic computational complexity treatment of these problems is sorely lacking. The workshop organizers aim to develop a general computational theory of SA and CS (as well as related areas such as group testing) and its relationship to coding theory. This goal can be achieved only by bringing together researchers from a variety of areas.

The Coding, Complexity and Sparsity workshop (SPARC 13) will be held in Ann Arbor, MI on Aug 5-7.

These will be hour-long lectures designed to give students an introduction to coding theory, complexity theory/pseudo-randomness, and compressive sensing/streaming algorithms. We will have a poster session during the workshop and everyone is welcome to bring a poster but graduate students and postdocs are especially encouraged to give a poster presentation.

This is the third incarnation of the workshop and the previous two workshops were also held in Ann Arbor in August of 2011 and 2012.

Confirmed speakers:

- Jin Yi Cai (University of Wisconsin, Madison)
- Shafi Goldwasser (MIT)
- Piotr Indyk (MIT)
- Swastik Kopparty (Rutgers University)
- Dick Lipton (Georgia Tech)
- Andrew McGregor (University of Massachusetts, Amherst)
- Raghu Meka (IAS)
- Jelani Nelson (Harvard)
- Eric Price (MIT)
- Christopher Ré (University of Wisconsin, Madison)
- Shubhangi Saraf (Rutgers University)
- Suresh Venkatasubramanian (University of Utah)
- David Woodruff (IBM)
- Mary Wootters (Michigan)
- Shuheng Zhou (Michigan)
We have some funding for graduate students and postdocs with preference given to those who will be presenting posters. For registration and other details, please look at the workshop webpage.

Ely Porat asked me remind everyone that the deadline for the 20th String Processing and Information Retrieval Symposium (SPIRE) is 2nd May, about a month from now. More details at websrv.cs.biu.ac.il/spire2013/.

**Update:** The deadline has been extended to 9th May.

A guest post from Krzysztof Onak:

A few recent workshops on sublinear algorithms compiled lists of open problems suggested by participants. During the last of them, in July in Dortmund, we realized that it would be great to have a single repository with all those problems. After followup discussions (with Alex Andoni, Piotr Indyk, and Andrew McGregor), we created a wiki page at http://sublinear.info/. Currently, it only contains open problems from the aforementioned workshops, but we invite submissions of inspiring problems from all areas of sublinear algorithms (sublinear time, sublinear space, etc.). Additionally, we want to compile a list of books, surveys, lecture notes, and slides that can be useful for learning about different areas of sublinear algorithms. We hope that this wiki will not serve only spambots, which have already been raiding it for a while, but it will also be a great source of inspiration for the whole community.

After a very successful hiring season last year, the department is now focusing on hiring in theory, NLP, robotics, and vision (that’s four separate searches rather than one extreme interdisciplinary position). So please apply! The official ad is here and note that, unlike previous years, we’re able to hire in theory at either the assistant or associate level. We’ll start reviewing applications December 3.

As promised, here are the slides from the STOC Workshop on Algorithms for Distributed and Streaming Data. The workshop was standing-room only so here’s your chance to review the slides while sitting down. More generally, all the workshops seemed to be a great success and I’m happy to see that the experiment will be repeated at FOCS. Deadline for proposals is 20 June.

- Sergei Vassilvitskii, Google: Distributed and Parallel Models
- Andrew McGregor, UMass Amherst: Data Streams and Linear Sketches
- John Langford, Microsoft Research: Fun Machine Learning Problems on Big Data
- Piotr Indyk, MIT: CS on CS: Computer Science Insights into Compressive Sensing (and vice versa)
- Ashish Goel, Stanford and Twitter: Challenges in Industry and Education

Thanks again to the speakers and everyone who came along.

While on the topic of STOC, I also wanted to mention a STOC workshop on “Algorithms for Distributed and Streaming Data” that will hopefully be of interest. It will take place Saturday afternoon, 19th May in NYU. The schedule can be found here.

So here’s the pitch: At this point it’s readily apparent that big data has become big news (e.g., see here and here). But what does this mean for the STOC/FOCS/SODA community? What are the algorithmic problems we could be solving? What are the appropriate computational models? Are there opportunities for industrial impact? What should we be teaching our undergraduate and graduate students? To address the relevant topics, we’ve lined-up a great set of speakers including Sergei Vassilvitskii, John Langford, Piotr Indyk, and Ashish Goel. Hope to see you there.

Piotr Indyk asked if I could post the following announcement and I’m happy to oblige.

**Piotr’s Post-Doc Position:** Applications are invited for a Postdoctoral Research Assistant position for the MIT-Shell-Draper Lab research project

“Applications of compressive sensing and sparse structure exploitation in hydrocarbon reservoir exploration and surveillance”

The goal of the project is to develop novel compressive sensing algorithms for geoscience problems in the area of hydrocarbon field exploration and surveillance. The appointment is initially for one-year, with the possibility of renewal for up to 3 years. The appointment should start either during the summer (the preferred option) or the fall of 2012.

Duties and Responsibilities:

- Provide expertise on and contribute to the development of compressive sensing and sparse recovery algorithms for geoscience applications
- Help MIT faculty in coordination of research projects, including periodic reporting and documentation as required by the program timeline
- Frequent travel to Shell facilities in Houston

Minimum Qualifications

- Ph.D. in Computer Science, Electrical Engineering, Mathematics or related disciplines

Preferred Qualifications

- Expertise in sparse recovery, compressive sensing, streaming/sketching, machine learning and statistical inference algorithms
- Experience and/or interest in research projects of interdisciplinary nature
- Programming experience (MATLAB)

Applications (including CV and three reference letters) should be submitted to

ideally by April 15, 2012. However, applications will be considered until the position is filled.

It’s the last day of Sublinear Algorithms 2011. Thanks to Artur Czumaj, Piotr Indyk, Ronitt Rubinfeld, and Robi Krauthgamer for organizing such a great workshop. Thanks also to all the guest bloggers (Jelani Nelson, Krzysztof Onak, Seshadhri, Piotr Indyk, Sariel Har-Peled) this week for doing such a great job. My only concern is that readers of this blog might start to expect posts of such a high quality. To minimize this risk, I’ll finish up the workshop blogging.

The day started with two property testing talks. First, Oded Lachish talked about “Testing a Language Accepted by a Fixed Boolean Formula”. Here we assume full knowledge of a formula and are given oracle access to an assignment of the variables. By querying only a few of the variable assignments we wish to distinguish between the cases when the assignment satisfies and the case when is far from doing so, i.e., all satisfying assignments differ in at least positions. Oded presented quasi-polynomial algorithms for a variety of cases including binary, read-once formula and binary, monotone and read-k-times formula.

Next up was Gilad Tsur who talked about “Approximating the Number of Relevant Variables in a Function” [Ron, Tsur]. Here there’s also a boolean formula but in contrast to the previous talk we don’t know . Rather, we can query the value of on inputs of our choosing. The goal is to multiplicatively estimate how many variables in the input actually play a role in determining the function. Unfortunately this is hard in general: consider a function that always a evaluates to 0 except on one specific input. However, Gilad showed it was possible for certain families of functions such as linear functions. He then considered the relaxation to distinguishing between the case when the number of relevant variables is at most , and the case in which it is far from any function in which the number of relevant variable is more than for some parameter .

After a short break, I talked about “Data Streams, Dyck Languages, and Detecting Dubious Data Structures” [Chakrabarti, Cormode, Kondapally, McGregor]. Here’s the problem… You are watching your friend interact with a priority queue: at each step she either inserts a new value into the queue or asks for the minimum value currently in the queue to be extracted. However, you’re suspicious that the priority queue may not always be extracting the correct minimum. Can you help your friend recognize when this happens without having to remember all the values she has inserted? In the talk we showed how this was possible. We also discussed recognizing whether a string of brackets was balanced and concluded that reading backwards can be a very useful skill. Slides are here.

## Recent Comments