Data stream analysis frequently relies on identifying correlations and posing conditional queries on the data after it has been seen. Correlated aggregates form an important example of such queries, which ask for an aggregation over one dimension of stream elements which satisfy a predicate on another dimension. Since recent events are typically more important than older ones, time decay should also be applied to downweight less significant values. We present space-efficient algorithms as well as space lower bounds for the time-decayed correlated sum, a problem at the heart of many related aggregations. By considering different fundamental classes of decay functions, we separate cases where efficient relative error or additive error is possible, from other cases where linear space is necessary to approximate. In particular, we show that no efficient algorithms are possible for the popular sliding window and exponential decay models, resolving an open problem. The results are surprising, since efficient approximations are known for other data stream problems under these decay models. This is a step towards better understanding which sophisticated queries can be answered on massive streams using limited memory and computation.
[ bib | .pdf ] Back
This file was generated by bibtex2html 1.92.