We have two users, A and B, who hold documents x and y respectively. Neither of the users has any information about the other's document. They exchange messages so that B computes x; it may be required that A compute y as well. Our goal is to design communication protocols with the main objective of minimizing the total number of bits they exchange; other objectives are minimizing the number of rounds and the complexity of internal computations. An important notion which determines the efficiency of the protocols is how one measures the distance between x and y. We consider several metrics for measuring this distance, namely the Hamming metric, the Levenshtein metric (edit distance), and a new LZ metric, which is introduced in this paper. We show how to estimate the distance between x and y using a single message of logarithmic size. For each metric, we present the first communication-efficient protocols, which often match the corresponding lower bounds. A consequence of these are error-correcting codes for these error models which correct up to d errors in n characters using O(d logn) bits. Our most interesting methods use a new histogram transformation that we introduce to convert edit distance to L1 distance.
[ bib | Alternate Version | slides | .pdf ] Back
This file was generated by bibtex2html 1.92.