Distance Geometry in Data Science (tutorial)

June 26, 2019, 1:20 PM - 2:40 PM

Location:

DIMACS Center

Rutgers University

CoRE Building

96 Frelinghuysen Road

Piscataway, NJ 08854

Click here for map.

Leo Liberti, CNRS and Ecole Polytechnique

Many problems in data science are addressed by mapping entities of various kind to vectors in a Euclidean space of some dimension. Most of these methods (e.g. Multidimensional Scaling, Principal Component Analysis, K-means clustering, random projections) are based on the proximity of pairs of vectors. In order for the results of these methods to make sense, the proximity of entities in the original problem must be well approximated in the Euclidean space setting. If proximity were known for each pair of original entities, this mapping would be a good example of isometric embedding. Usually, however, this is not the case, as data are partial, noisy and wrong. I shall survey some of the methods above from the point of view of Distance Geometry.