We’re presented with a numerical stream A=\langle a_1, \ldots, a_m\rangle\in [n]^m and we want to compute some function \sum_{i\in [n]} g(f_i) where f_i=|\{j:a_j=i\}| is the frequency of i and g is an arbitrary function with g(0)=0. One of the many neat ideas in [Alon, Matias, Szegedy] is to use the following basic estimator for this type of problem:

  1. Pick J uniformly at random from [m]
  2. Compute R=|\{j\geq J:a_j=a_J\}|
  3. Let X=m(g(R)-g(R-1))

Note that X can be computed using O(\log m+\log n) space as the data arrives. It’s not hard to show that E[X]=\sum_{i\in [n]} g(f_i) because,

E[X]= \sum_{i} E[X|a_J=i] \Pr[a_J=i]=\sum_{i} E[g(R)-g(R-1)|a_J=i] f_i


E[g(R)-g(R-1)|a_J=i] = \frac{g(1)-g(0)}{f_i}+\ldots + \frac{g(f_i)-g(f_i-1)}{f_i}=\frac{g(f_i)}{f_i}\ .

Hence, computing multiple basic estimators in parallel and averaging appropriately will give a good estimate of \sum_{i\in [n]} g(f_i). Quite how many basic estimators are required depends on the variance of X and this determines the total space used by the algorithm.

Using this approach, it was shown that the k-th order frequency moment, F_k=\sum_{i\in [n]} f_i^k, can be approximated up to a factor (1+\epsilon) with probability (1-\delta) using only O(\epsilon^{-2} n^{1-1/k} \log \delta^{-1} (\log m+\log n)) bits of space. This was subsequently improved to an (essentially) optimal result in [Indyk, Woodruff].