You are currently browsing the monthly archive for October 2010.

Today, we consider estimating the median of a set of $n$ values given a single pass over a stream formed by randomly permuting the set. Seems like a pretty simple problem, right?

It’s known that we can find an $O(\sqrt{n})$-approximate median, i.e., an element with rank $n/2 \pm O(\sqrt{n})$, in polylog space, with probability 9/10 where the probability is taken over both the coin flips of the algorithm and the random permutation. However, the best known lower bound states that finding a $n^\gamma$-approximate median requires $\Omega(n^{1/2-3\gamma/2})$ space [Guha, McGregor]. If this bound is optimal, it would be possible to find a $n^{1/3}$-approximate median in polylogarithmic space. Does there exist such an algorithm or can the lower bound be improved?

Background. The exact median can be found in $O(\sqrt{n})$ space [Munro, Paterson]. It seems reasonable to conjecture that $\Theta(n^{1/2-\gamma})$ space is necessary and sufficient to find a $O(n^\gamma)$-approximate median.

(Disclaimer: I’ve suppressed some poly-log and poly-constant terms above. And, actually, you can shave-off one of the $O(\log n)$ terms I didn’t mention in the lower bound if you consider some public random bits in the argument.) ### About

A research blog about data streams and related topics.