Monday, July 9, 2012

How to run Mahout Minhash Clustering


1. mahout seqdirectory -i /home/venkat/Desktop/minhash4/reuters-out -o /home/venkat/Desktop/minhash4/reuters-out-seqdir -c UTF-8 -chunk 5

2. mahout seq2sparse  -i /home/venkat/Desktop/minhash4/reuters-out-seqdir/  -o /home/venkat/Desktop/minhash4/reuters-out-seqdir-sparse-minhash --maxDFPercent 85 --namedVector

3. mahout org.apache.mahout.clustering.minhash.MinHashDriver -i /home/venkat/Desktop/minhash4/reuters-out-seqdir-sparse-minhash/tfidf-vectors  -o /home/venkat/Desktop/minhash4/reuters-minhash --overwrite 


No comments:

Post a Comment