G. Cormode and M. Garofalakis. Join sizes, frequency moments, and applications. In M. Garofalakis, J. Gehrke, and R. Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams. Springer, 2016.

In this chapter, we focus on a set of problems chiefly inspired by the problem of estimating the join size between two database relations. This problem is at the heart of a wide variety of other problems, both in databases and beyond. Given two relations, R and S, and an attribute a common to both relations, the equi-join between the two relations, R and S, consists of one tuple for each r inR and s inS pair such that r.a = s.a. Estimating the joinsize between pairs of relations is a key component in planning how to answer a complex SQL query that may contain arbitrary combinations of selection, projection and join tasks. The result can also be used to answer some queries approximately. This applies directly in a streaming context, and additionally within a traditional database management system that is operating on the stream of updates generated by tuple insertions and deletions. Knowing the join size is also a important problem at the heart of a variety of other problems, such as building histogram and wavelet representations of data, and can be applied to problems such as finding frequent items and quantiles, and modelling spatial and high-dimensional data.

bib | Alternate Version | .pdf ] Back


This file was generated by bibtex2html 1.92.