Hadoop | |
Hadoop Distributed File System | |
1 | A scalable, Fault tolerant, High performance distributed file system(storage) |
2 | Asynchronous replication |
3 | Write-one and read-many (WORM) |
4 | Data compression ( BZIP2) |
5 | Hadoop cluster with 3 modes minimum |
6 | Data divided into multiple of 64MB blocks |
7 | Each block is replicated 3 times ( default) |
8 | No RAID required |
9 | Access from RESTful, Java, FUSE |
10 | Name Node holds filesystem metadata |
11 | Files are broken up and spread over the DataNodes |
Map Reduce | |
1 | Software framework for distributed computation |
2 | Input | Map() | copy/sort | reduce() | output |
3 | JobTracker schedules and manages jobs on the NameNode |
4 | TaskTracker executes individual map() and reduce() tasks on each DataNode |
5 | Map Phase – Raw data analyzed and converted to name / value pair |
6 | Shuffle Phase – All name / value pairs are sorted and grouped by their keys |
7 | Reduce Phase – All values associated with a key are processed for results |
8 | NameNode & JobTracker on the same server |
9 | DataNode & TaskTrakcer on the same server |
Monday, July 16, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment