Thursday, September 13, 2012

MinHash clustering


- Also known as "min-wise independent permutations locality sensitive hashing scheme"

a technique for quickly estimating how similar two sets are.

- Jaccard similarity and minimum Hash values 
       The Jaccard similarity coefficient of two sets A and B is defined to be
        J(A,B) = | A n B | / | A U B | 

- It is a number between 0 and 1
- 0 - two sets are disjoint , 1 - two sets are equal







No comments:

Post a Comment