MinHash clustering
- Also known as "min-wise independent permutations locality sensitive hashing scheme"
- a technique for quickly estimating how similar two sets are.
- Jaccard similarity and minimum Hash values
The Jaccard similarity coefficient of two sets A and B is defined to beJ(A,B) = | A n B | / | A U B |
- It is a number between 0 and 1
- 0 - two sets are disjoint , 1 - two sets are equal