Today, we consider estimating the median of a set of values given a single pass over a stream formed by randomly permuting the set. Seems like a pretty simple problem, right?
It’s known that we can find an -approximate median, i.e., an element with rank
, in polylog space, with probability 9/10 where the probability is taken over both the coin flips of the algorithm and the random permutation. However, the best known lower bound states that finding a
-approximate median requires
space [Guha, McGregor]. If this bound is optimal, it would be possible to find a
-approximate median in polylogarithmic space. Does there exist such an algorithm or can the lower bound be improved?
Background. The exact median can be found in space [Munro, Paterson]. It seems reasonable to conjecture that
space is necessary and sufficient to find a
-approximate median.
(Disclaimer: I’ve suppressed some poly-log and poly-constant terms above. And, actually, you can shave-off one of the terms I didn’t mention in the lower bound if you consider some public random bits in the argument.)


4 comments
Comments feed for this article
October 7, 2010 at 10:58 am
Andrew
At some point I convinced myself of the following: if you partitioned the set between Alice and Bob such that Alice received each value with an arbitrary probability p, then with a O(log n) bit message from Alice, Bob can find a
-approximate median for some constant
. This suggests that any improved lower bound via communication complexity may need to involve more than two players.
October 7, 2010 at 8:40 pm
Mihai
Isn’t it true that for p = constant bounded away from 0 and 1, you get an O(n^{1/4}) upper bound?
If Bob receives n^{2/3} samples, it seems you obtain the Omega(n^{1/3}) approximation lower bound that you speak of.
October 7, 2010 at 9:05 pm
Andrew
Yes, for constant p that sounds about right. So my claim was that even for non-constant p, Alice and Bob can always do better than an n^{1/2}-approximate median (with only log bits of communication). But the communication protocols I considered didn’t give rise to a stream algorithm that did better than an n^{1/2}-approximation.
And yes, Bob receiving about n^{2/3} samples was the situation analyzed to prove the existing lower bound (or rather to determine the point where the lower bound became non-trivial).
September 18, 2011 at 10:26 pm
Andrew
Update: We’ve recently made some progress on this problem. See http://www.cs.umass.edu/~mcgregor/papers/11-median.pdf.