Technology

Monday, July 16, 2012

	Hadoop

	Hadoop Distributed File System

1	A scalable, Fault tolerant, High performance distributed file system(storage)
2	Asynchronous replication
3	Write-one and read-many (WORM)
4	Data compression ( BZIP2)
5	Hadoop cluster with 3 modes minimum
6	Data divided into multiple of 64MB blocks
7	Each block is replicated 3 times ( default)
8	No RAID required
9	Access from RESTful, Java, FUSE
10	Name Node holds filesystem metadata
11	Files are broken up and spread over the DataNodes

	Map Reduce

1	Software framework for distributed computation
2	Input \| Map() \| copy/sort \| reduce() \| output
3	JobTracker schedules and manages jobs on the NameNode
4	TaskTracker executes individual map() and reduce() tasks on each DataNode

5	Map Phase – Raw data analyzed and converted to name / value pair
6	Shuffle Phase – All name / value pairs are sorted and grouped by their keys
7	Reduce Phase – All values associated with a key are processed for results

8	NameNode & JobTracker on the same server
9	DataNode & TaskTrakcer on the same server

Monday, July 16, 2012