Gödel’s Lost Letter recently mentioned one of my favourite papers from the prehistory of data streams: Selection and Sorting with Limited Storage [Munro, Paterson]. It’s a great paper that prefigured many interesting research directions including multi-pass algorithms and processing streams in random order. Here’s one of the nice “bite-sized” results.
We consider a stream formed by arranging a set of distinct values
in random order, i.e.,
for a permutation
of
chosen uniformly at random. The goal is to process the stream in small-ish space and yet find the exact median with high probability (over the random ordering).
Here’s a simple algorithm that maintains a set of
elements and two counters
and
which are initialized to zero. Store the first
values from the stream in
and as each new element
arrives, do one of the following:
- If
Increment
- If
Increment
- If
and
: Add
to
, remove smallest element from
, and increment
- If
and
: Add
to
, remove largest element from
, and increment
At the end of the stream, if then return the
-th smallest element in
. Otherwise fail.
Why does the algorithm make sense? It’s easy to show that at any point, the set of elements that has arrived thus far consists of and
values less than
and
values greater that
. Hence, if at the end of the stream,
has a
-th smallest element, this element is the median. All that remains is to show that the algorithm fails with small probability. This follows (after a bit of analysis) by considering
and noting that, for some constant
,
- Probability a non-zero
is incremented at any step is at most 1/2 if
is either incremented or decremented at each step
- If
at the final step, then
as required
Hence, we have a simple “random-walk-like” set-up and there’s a that ensures that, with high probability, the algorithm doesn’t fail.
Of course you could ask about finding approximate medians of randomly ordered streams in one or more passes. If you do, check out [Guha, McGregor], [Chakrabarti, Jayram, Patrascu], and [Chakrabarti, Cormode, McGregor].

1 comment
Comments feed for this article
August 31, 2009 at 7:29 pm
rjlipton
Enjoyed your explaining the details of the paper. It is a classic paper.