We’re presented with a numerical stream and we want to compute some small-space approximation of
where
is the frequency of
. In particular, we want to be able to return estimates
that satisfy
where is not known ahead of time. CR-Precis [Ganguly, Majumder] is a neat little deterministic stream algorithm that achieves this in
space. (Also check out [pg. 31, Gasieniec, Muthukrisnan] for a special case.)
Let be the first
prime numbers and define “hash” functions
. Based on these functions, the CR-Precis algorithm maintains
counters
which are initially zero. When we observe
in the stream we increment each of
for
.
Say we’re interested in the frequency of . At the end of the day, we end up with
overestimates for
, i.e.,
. Hence, it makes sense to return
. How much of an overestimate can this be?
Note that for any , there are at most
functions
under which
collide, i.e.,
. This follows from the Chinese Remainder Theorem. Hence, we deduce that,
and therefore . Setting
gives the desired result.
But wait, there’s more! There are actually other deterministic algorithms for this problem that use only space: e.g. “Misra-Gries”, “Space-Saving”, and “Lossy-Counting” algorithms (see [Cormode, Hadjieleftheriou] for a survey and experimental comparison). What makes CR-Precis more useful is that it’s a sketch algorithm, i.e., it’s possible to transform the hash functions into a
matrix
in a natural way such that the algorithm described above is simply computing
. This confers numerous advantages to the algorithm including the ability to handle “deletes”, i.e., we consider a stream
and define
.
Assuming that all , the above algorithm is easily adapted to this new setting by decrementing the appropriate counters when processing
and incrementing when processing
.

8 comments
Comments feed for this article
September 22, 2009 at 1:41 pm
Graham Cormode
I like the construction, it’s very neat. However, it’s also worth mentioning that there are also randomized structures that meet the “sketch” requirement and have the eps^{-1} space usage. In this context, I tend to think of the CR-precis as a derandomization of results which can use sketches (giving deterministic streaming results for point queries, quantiles, etc.). The various overheads mean that you are unlikely to want to ever actually implement it, unless you really really want a deterministic solution.
September 22, 2009 at 1:49 pm
polylogblog
Quite right. We’ll be discussing some of those sketches soon :)
September 24, 2009 at 12:16 am
Piotr Indyk
Alternatively, one can take a look at a recent article in Communications of the ACM , describing some of those algorithms.
September 24, 2009 at 1:05 am
polylogblog
Cool. I knew the CACM article had been written but I hadn’t realized that it had appeared.
Graham: Should I replace the link to your VLDB paper with a link to your CACM paper?
September 24, 2009 at 12:23 am
Amit Chakrabarti
The step where you mentioned CRT doesn’t actually need CRT; it’s just a straighforward consequence of the fact that a natural number $N$ has at most $\lg N$ prime factors.
September 24, 2009 at 12:54 am
polylogblog
Good point Amit. Of course, since the CR is CR-precis stands for “Chinese Remainder” perhaps we need a new name for this sketch. Can anyone suggest a backronym for “GM-sketch”?
September 24, 2009 at 1:50 pm
Piotr Indyk
Btw: Andrew, welcome to the Planet Blog! (not that I live on that planet :)
October 14, 2009 at 5:52 am
metoo
Concur with AC. All you need is a natural number $N$ has at most $\lg N$ prime factors. This is how the proof is phrased in [Pg 31] you refer to.
Also, let me point out that these prime number based constructions have been used in “Improved Bounds for a Deterministic Sublinear-Time Sparse Fourier Algorithm” by Mark Iwen and C.V. Spencer. http://www.ima.umn.edu/~iwen/WebPage/improved08.pdf and other places.
Finally, when I was thinking about that Pg31 result, I had in mind a bigger technical problem: coding and decoding in sublinear space deterministically for suitable codes (Group testing, k-selectors, Reed-Muller, ….). Progress on that problem is here: Efficiently Decodable Non-adaptive Group Testing. Piotr Indyk, Hung Q. Ngo and Atri Rudra
To Appear in SODA 2010.
- Metoo