MassDAL

MassDAL Public Code Bank : Sketches and Frequent Items

This page is a library of routines in C for data streaming and other massive data set analysis. Currently, these are derived from internal testing routines ("proof of concept" implementations), and so are distributed with no guarantees. In particular, there is relatively little error checking (parameters in range, memory allocation errors) in these implementations. For support and advice, please contact graham a dimacs dot rutgers dot edu.

Count-Min Sketch

countmin.h
countmin.c
Count-Min sketches (Cormode, Muthukrishnan '04). They support:

Combinatorial Group Testing

cgt.h
cgt.c
Described in "What's Hot and What's Not: Tracking Frequent Items Dynamically" They support:

CCFC Sketches

ccfc.h
ccfc.c
Count sketches from Charikar, Chen, Farach-Colton '02. They support:

AMS Sketches

ams.h
ams.c
"Tug of War" sketches due to Alon, Matias and Szegedy 96, and Alon Gibbon Matias and Szegedy, 99. Some hashing tricks used for faster implementation. They support:

Flajolet-Martin Sketches

fm.h
fm.c
Flajolet-Martin sketches for estimating the number of distinct items in a stream. They support finding the number of distinct items (L0 norm).

Stable Distributions and sketches

stable.h
stable.c
Sketches based on stable distributions. Based on Indyk'00, Cormode, Indyk, Koudas, Muthukrishnan '02, Cormode, Data, Indyk, Muthukrishnan, '02. They support

Library Routines

These are some general routines which are made use of by various of the other code available here.

Random number generators prng.h
prng.c
Timing Routines timer.h
timer.c

Frequent Item Algorithms

These are routines for finding frequent items in data streams. They are derived from code used in the paper "What's Hot and What's Not: Tracking Frequent Items Dynamically"

MethodFilesDescription
Frequent frequent.h
frequent.c
Algorithm Frequent for insert only streams
LossyCounting lossycount.h
lossycount.c
Lossy Counting algorithm due to Manku and Motwani

Usage

Each of the .h files exports various methods:

Sample Usage

Code Bundle

massdalsketches.zip All the above .c and .h files together in one easy to download .zip file.
The following licence applies to all code distributed here. Creative Commons
License This work is licensed under a Creative Commons License.