The edit distance between two stringsSandRis defined to be the minimum number of character inserts, deletes and changes needed to convertRtoS. Given a text stringtof lengthn, and a pattern stringpof lengthm, informally, the string edit distance matching problem is to compute the smallest edit distance betweenpand substrings oft. A well known dynamic programming algorithm takes timeO(nm) to solve this problem, and it is an important open problem in Combinatorial Pattern Matching to significantly improve this bound.We relax the problem so that (a) we allow an additional operation, namely,

substring moves, and (b) we approximate the string edit distance upto a factor ofO(lognlog^{*}n). (log^{*}nis the number of times log function is applied tonto produce a constant.) Our result is a near linear time deterministic algorithm for this version of the problem. This is the first known significantly subquadratic algorithm for a string edit distance problem in which the distance involves nontrivial alignments. Our results are obtained by embedding strings intoL_{1}vector space using a simplified parsing technique we callEdit Sensitive Parsing(ESP). This embedding is approximately distance preserving, and we show many applications of this embedding to string proximity problems including nearest neighbors, outliers, and streaming computations with strings.

[ bib | http | Alternate Version | slides | .pdf ] Back

*This file was generated by
bibtex2html 1.92.*