Due to the overwhelming flow of information in many data stream applications, data outsourcing is a natural and effective paradigm for individual businesses to address the issue of scale. In the standard data outsourcing model, the data owner outsources streaming data to one or more third-party servers, which answer queries posed by a potentially large number of clients on the data owner's behalf. Data outsourcing intrinsically raises issues of trust, making outsourced query assurance on data streams a problem with important practical implications. Existing solutions proposed in this model all build upon cryptographic primitives such as signatures and collision-resistant hash functions, which only work for certain types of queries, e.g., simple selection/aggregation queries.
In this paper, we consider another common type of queries, namely, “ GROUP BY, SUM” queries, which previous techniques fail to support. Our new solutions are not based on cryptographic primitives, but instead use algebraic and probabilistic techniques to compute a small synopsis on the true query result, which is then communicated to the client so as to verify the correctness of the query result returned by the server. The synopsis uses a constant amount of space irrespective of the result size, has an extremely small probability of failure, and can be maintained using no extra space when the query result changes as elements stream by. We then generalize our synopsis to allow some tolerance on the number of erroneous groups, in order to support semantic load shedding on the server. When the number of erroneous groups is indeed tolerable, the synopsis can be strengthened so that we can locate and even correct these errors. Finally, we implement our techniques and perform an empirical evaluation using live network traffic.
[ bib | http | .pdf ] Back
This file was generated by bibtex2html 1.92.