Technology

Friday, December 28, 2012

Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching

- Canopies can be applied to many domains and used with a variety of clustering approaches, including

Greedy Agglomerative Clustering,

K-means and

Expectation-Maximization.

- we do not calculate the distance between two points that never appear in the same canopy, i.e. we assume their distance to be infinite.