| Hadoop | |
| Hadoop Distributed File System | |
| 1 | A scalable, Fault tolerant, High performance distributed file system(storage) |
| 2 | Asynchronous replication |
| 3 | Write-one and read-many (WORM) |
| 4 | Data compression ( BZIP2) |
| 5 | Hadoop cluster with 3 modes minimum |
| 6 | Data divided into multiple of 64MB blocks |
| 7 | Each block is replicated 3 times ( default) |
| 8 | No RAID required |
| 9 | Access from RESTful, Java, FUSE |
| 10 | Name Node holds filesystem metadata |
| 11 | Files are broken up and spread over the DataNodes |
| Map Reduce | |
| 1 | Software framework for distributed computation |
| 2 | Input | Map() | copy/sort | reduce() | output |
| 3 | JobTracker schedules and manages jobs on the NameNode |
| 4 | TaskTracker executes individual map() and reduce() tasks on each DataNode |
| 5 | Map Phase – Raw data analyzed and converted to name / value pair |
| 6 | Shuffle Phase – All name / value pairs are sorted and grouped by their keys |
| 7 | Reduce Phase – All values associated with a key are processed for results |
| 8 | NameNode & JobTracker on the same server |
| 9 | DataNode & TaskTrakcer on the same server |
Monday, July 16, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment