Monday, July 16, 2012



Hadoop



Hadoop Distributed File System


1 A scalable, Fault tolerant, High performance distributed file system(storage)
2 Asynchronous replication
3 Write-one and read-many (WORM)
4 Data compression ( BZIP2)
5 Hadoop cluster with 3 modes minimum
6 Data divided into multiple of 64MB blocks
7 Each block is replicated 3 times ( default)
8 No RAID required
9 Access from RESTful, Java, FUSE
10 Name Node holds filesystem metadata
11 Files are broken up and spread over the DataNodes



Map Reduce


1 Software framework for distributed computation
2 Input | Map() | copy/sort | reduce() | output
3 JobTracker schedules and manages jobs on the NameNode
4 TaskTracker executes individual map() and reduce() tasks on each DataNode


5 Map Phase – Raw data analyzed and converted to name / value pair
6 Shuffle Phase – All name / value pairs are sorted and grouped by their keys
7 Reduce Phase – All values associated with a key are processed for results


8 NameNode & JobTracker on the same server
9 DataNode & TaskTrakcer on the same server


No comments:

Post a Comment