Thursday, September 13, 2012

MinHash clustering


- Also known as "min-wise independent permutations locality sensitive hashing scheme"

a technique for quickly estimating how similar two sets are.

- Jaccard similarity and minimum Hash values 
       The Jaccard similarity coefficient of two sets A and B is defined to be
        J(A,B) = | A n B | / | A U B | 

- It is a number between 0 and 1
- 0 - two sets are disjoint , 1 - two sets are equal







Collaborative Filtering 

- It is a technique used by some recommender systems. 
- has two seneses 
         - narrow one 
         - more general one 
Collaborative filtering explores techniques for matching people with similar interests and making recommendations on this basis.

- Collaborative filtering algorithms often require 
   (1) users’ active participation, 
   (2) an easy way to represent users’ interests to the system, and 
   (3) algorithms that are able to match people with similar interests.

- Methodology
      - User-based collaborative filtering 
      - Item-based collaborative filtering 

- Types 
      - Memory based 
              This mechanism uses user rating data to compute similarity between users or items
               Pearson correlation or Vector cosine 
              A popular method to find the similar users is the Locality sensitive hashing, which implements the  nearest neighbor mechanism in linear time.

     - Model-based 
             Models are developed using data miningmachine learning algorithms to find patterns based on training data
             Model based CF Algorithms. 
                   These include Bayesian Networks
                                        clustering models
                                        latent semantic models such as 
                                                              singular value decomposition
                                                              probabilistic latent semantic analysis
                                                              Multiple Multiplicative Factor, 
                                                              Latent Dirichlet allocation and 
                                                              markov decision process based models