We’re presented with a numerical stream and we want to compute some small-space approximation of where is the frequency of . In particular, we want to be able to return estimates that satisfy
where is not known ahead of time. CR-Precis [Ganguly, Majumder] is a neat little deterministic stream algorithm that achieves this in space. (Also check out [pg. 31, Gasieniec, Muthukrisnan] for a special case.)
Let be the first prime numbers and define “hash” functions . Based on these functions, the CR-Precis algorithm maintains counters which are initially zero. When we observe in the stream we increment each of for .
Say we’re interested in the frequency of . At the end of the day, we end up with overestimates for , i.e., . Hence, it makes sense to return . How much of an overestimate can this be?
Note that for any , there are at most functions under which collide, i.e., . This follows from the Chinese Remainder Theorem. Hence, we deduce that,
and therefore . Setting gives the desired result.
But wait, there’s more! There are actually other deterministic algorithms for this problem that use only space: e.g. “Misra-Gries”, “Space-Saving”, and “Lossy-Counting” algorithms (see [Cormode, Hadjieleftheriou] for a survey and experimental comparison). What makes CR-Precis more useful is that it’s a sketch algorithm, i.e., it’s possible to transform the hash functions into a matrix in a natural way such that the algorithm described above is simply computing . This confers numerous advantages to the algorithm including the ability to handle “deletes”, i.e., we consider a stream
Assuming that all , the above algorithm is easily adapted to this new setting by decrementing the appropriate counters when processing and incrementing when processing .