How Hadoop Works

 Hadoop provides the Hadoop Distributed File System(HDFS). When we upload the data to the HDFS. HDFS will  partition the data across clusters(Keeping multiple copies of it in case of hardware failures ) and then it can deploy the code to the machine. that contains the data upon which it is intended to operate.

HDFS organizes the data by using keys and values rather than relationally. Each piece of data has unique key and value associated with that key.

 The components that comprise Hadoop are :

 HDFS : The Hadoop file system is a Distributed file system which holds the large amount of data across multiple nodes in a cluster.

MapReduce Application :
MapReduce is a  functional programming paradigm to analyze a single record in our HDFS. The mapper is responsible for data processing step and Reducer receives the output from the mapper and sorts the data that applies to the same key.


An HDFS cluster is comprised of a Name-Node and Data-Nodes. Name-Node manages the cluster metadata and Data-Nodes that store the data.

Hadoop Architecture

Hadoop Architecture

The file content is split into large blocks (typically 128 megabytes), and each block of the file is independently replicated at multiple Data-Nodes. The blocks are stored on the local file system on the data-nodes.

 Function of Name-Node :

It actively monitors the number of replicas of a block. When a replica of a block is lost due to a Data-Node failure or disk failure, the Name-Node creates another replica of the block.

It maintains the namespace tree and the mapping of blocks to Data-Nodes, holding the entire namespace image in RAM.

It does not directly send requests to Data-Nodes. It sends instructions to the Data-Nodes by replying to heartbeats sent by those Data-Nodes.

The instructions include commands to: replicate blocks to other nodes, remove local block replicas, re-register and send an immediate block report, or shut down the node.

1.
2.
3.
4.
5.
6.
7.
 


Popular Posts