We’re presented with a numerical stream and we want to compute some function where is the frequency of and is an arbitrary function with . One of the many neat ideas in [Alon, Matias, Szegedy] is to use the following *basic estimator* for this type of problem:

- Pick uniformly at random from
- Compute
- Let

Note that can be computed using space as the data arrives. It’s not hard to show that because,

and

Hence, computing multiple basic estimators in parallel and averaging appropriately will give a good estimate of . Quite how many basic estimators are required depends on the variance of and this determines the total space used by the algorithm.

Using this approach, it was shown that the -th order frequency moment, , can be approximated up to a factor with probability using only bits of space. This was subsequently improved to an (essentially) optimal result in [Indyk, Woodruff].

## Leave a comment

Comments feed for this article