Today, we consider estimating the median of a set of values given a single pass over a stream formed by randomly permuting the set. Seems like a pretty simple problem, right?

It’s known that we can find an -approximate median, i.e., an element with rank , in polylog space, with probability 9/10 where the probability is taken over both the coin flips of the algorithm and the random permutation. However, the best known lower bound states that finding a -approximate median requires space [Guha, McGregor]. If this bound is optimal, it would be possible to find a -approximate median in polylogarithmic space. Does there exist such an algorithm or can the lower bound be improved?

**Background.** The exact median can be found in space [Munro, Paterson]. It seems reasonable to conjecture that space is necessary and sufficient to find a -approximate median.

(Disclaimer: I’ve suppressed some poly-log and poly-constant terms above. And, actually, you can shave-off one of the terms I didn’t mention in the lower bound if you consider some public random bits in the argument.)

### Like this:

Like Loading...

*Related*

## 4 comments

Comments feed for this article

October 7, 2010 at 10:58 am

AndrewAt some point I convinced myself of the following: if you partitioned the set between Alice and Bob such that Alice received each value with an arbitrary probability p, then with a O(log n) bit message from Alice, Bob can find a -approximate median for some constant . This suggests that any improved lower bound via communication complexity may need to involve more than two players.

October 7, 2010 at 8:40 pm

MihaiIsn’t it true that for p = constant bounded away from 0 and 1, you get an O(n^{1/4}) upper bound?

If Bob receives n^{2/3} samples, it seems you obtain the Omega(n^{1/3}) approximation lower bound that you speak of.

October 7, 2010 at 9:05 pm

AndrewYes, for constant p that sounds about right. So my claim was that even for non-constant p, Alice and Bob can always do better than an n^{1/2}-approximate median (with only log bits of communication). But the communication protocols I considered didn’t give rise to a stream algorithm that did better than an n^{1/2}-approximation.

And yes, Bob receiving about n^{2/3} samples was the situation analyzed to prove the existing lower bound (or rather to determine the point where the lower bound became non-trivial).

September 18, 2011 at 10:26 pm

AndrewUpdate: We’ve recently made some progress on this problem. See http://www.cs.umass.edu/~mcgregor/papers/11-median.pdf.