Sunday, July 1, 2012

Mahout Clustering Command line commands 


Convert raw data into vector 

./bin/mahout seqdirectory
-i ./examples/bin/work/reuters-out/
-o ./examples/bin/work/reuters-out-seqdir
-c UTF-8
-chunk 5

Create tf-idf values 


./bin/mahout seq2sparse
-i ./examples/bin/work/reuters-out-seqdir/
-o ./examples/bin/work/reuters-out-seqdir-sparse

Run Kmeans Algorithm

./bin/mahout kmeans
-i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/
-c ./examples/bin/work/clusters
-o ./examples/bin/work/reuters-kmeans
-x 10
-k 2
-ow






No comments:

Post a Comment