Hadoop Installation in UNIX

Prerequisite for Hadoop Installation :

Hadoop requires java 1.5 & above for its working but Java 1.6 is recommended.
So first thing you need in your machine is java 1.6.

Step 1. Check you have java 1.6 installed or not.

Write the command $ java -version and then press enter.

Step 2. If it is not there you can install the same with the below command:

$ sudo apt-get install openjdk-6-jre

After installation of java, Check Java is installed properly or not by by using

$ java -version command.

If the above output comes, java is installed properly on your system.

You can check for the installation package at /usr/lib/jvm/

Step 3. Adding a dedicated system user :

It helps to separate the Hadoop installation with other software application and also with the user account running on the single node.

So, for creating a separate user you can use the below commands:

$ sudo addgroup hadoop1

where Hadoop1 is a group name.

Adding a user in Hadoop1 group

$ sudo adduser --ingroup hadoop hduser

This will add a user hduser and a group Hadoop to your local machine.

Step 4. Configuring a SSH(Secure Shell) to localhost :

Hadoop requires SSH access to manage its nodes. So for this single node installation of Hadoop we need to configure the SSH access to localhost.

We will be creating this access for the hduser we created in the previous step.

$ sudo apt-get install openssh-server

Step 5. Switch user to hduser by using the command 'su'

$ su - hduser

Step 6. After the SSH server installation. we have to generate an SSH key for the hduser:

$ ssh-keygen -t rsa -P ""

Step 7. Now since the key pair is generated we have to enable SSH access to local machine with this newly created key. For that you have to put the below command.

hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Step 8. Finally you can check the same using command :

$ ssh localhost

Hadoop Installation :

Download & Extract Hadoop :

So if you have all the above prerequisite in your machine,you are good to go with the Hadoop installation.

First download Hadoop from http://www.apache.org/dyn/closer.cgi/hadoop/core and extract the same at any location, I kept it at /usr/local. Also you need to change the owner permission of all files to hduser and group to Hadoop.

$ cd /usr/local

$ sudo tar xzf hadoop-1.2.1.tar.gz

$ sudo mv hadoop-1.0.3 hadoop

$ sudo chown -R hduser:hadoop hadoop

Update $HOME/.bashrc

Update the following lines at the end of $Home/.bashrc file of user hduser. Well if you are using a different shell than bash, you have to update the appropriate configuration file.

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

export PATH=$PATH:$HADOOP_HOME/bin

Configuration File Setup

By now we are almost done with the Hadoop installation. Now what we have to do is, change a few properties of the configuration file provided in Hadoop Conf folder.
But before that we have to make a directory where we are going to save our data on the local node cluster. We will be saving our data on HDFS.

So lets create the directory and set the required ownership and permission.

$ sudo mkdir /tmp/hadoop_data
$ sudo chown hduser:hadoop /tmp/hadoop_data

$ sudo chmod 777 /tmp/hadoop_data

Now lets start changing a few of the required configuration file.

Note: you will find all these configuration file inside hadoop/conf directory where you have put your file. In my case it is at /usr/local/hadoop/conf.

hadoop-env.sh

Open the Hadoop-env.sh file and change the only required environment variable for local machine installation. And it is JAVA_HOME. For this you just need to uncomment the below line and set the JAVA_HOME environment to your JDK/JRE directory.

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

core-site.xml

In between <configuration> ... </configuration> put the below code:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop_data</value>
  <description>directory for hadoop data</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description> data to be put on this URI</description>
</property>

mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>...
  </description>
</property>

hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

Formatting and Starting the Single Node Cluster.

So if you are done till now successfully, you are done with the installation part. Now we just have to format the name-node and start the cluster.

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

1.

Introduction on Hadoop

2.

How Hadoop Works

3.

Hadoop Installation

4.

How to start Hadoop in UNIX

5.

HDFS Commands

6.

Reading and Writing the data from Data-Node

7.

Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

Learn Big Data Analyics Techniques

Hadoop Installation in UNIX

Prerequisite for Hadoop Installation :

Step 1. Check you have java 1.6 installed or not.

After installation of java, Check Java is installed properly or not by by using

$ java -version command.

If the above output comes, java is installed properly on your system.

You can check for the installation package at /usr/lib/jvm/

Step 3. Adding a dedicated system user :

Adding a user in Hadoop1 group

Step 4. Configuring a SSH(Secure Shell) to localhost :

Step 4. Configuring a SSH(Secure Shell) to localhost :

Step 6. After the SSH server installation. we have to generate an SSH key for the hduser:

Hadoop Installation :

Download & Extract Hadoop :

Update $HOME/.bashrc

Configuration File Setup

hadoop-env.sh

core-site.xml

mapred-site.xml

hdfs-site.xml

Formatting and Starting the Single Node Cluster.

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

1.

Introduction on Hadoop

2.

How Hadoop Works

3.

Hadoop Installation

4.

How to start Hadoop in UNIX

5.

HDFS Commands

6.

Reading and Writing the data from Data-Node

7.

Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

1 comments:

Popular Posts

1.	Introduction on Hadoop
2.	How Hadoop Works
3.	Hadoop Installation
4.	How to start Hadoop in UNIX
5.	HDFS Commands
6.	Reading and Writing the data from Data-Node
7.	Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

Learn Big Data Analyics Techniques

Hadoop Installation in UNIX

Prerequisite for Hadoop Installation :

Step 1. Check you have java 1.6 installed or not.

After installation of java, Check Java is installed properly or not by by using

$ java -version command.

If the above output comes, java is installed properly on your system.

You can check for the installation package at /usr/lib/jvm/

Step 3. Adding a dedicated system user :

Adding a user in Hadoop1 group

Step 4. Configuring a SSH(Secure Shell) to localhost :

Step 4. Configuring a SSH(Secure Shell) to localhost :

Step 6. After the SSH server installation. we have to generate an SSH key for the hduser:

Hadoop Installation :

Download & Extract Hadoop :

Update $HOME/.bashrc

Configuration File Setup

hadoop-env.sh

core-site.xml

mapred-site.xml

hdfs-site.xml

Formatting and Starting the Single Node Cluster.

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

1. Introduction on Hadoop 2. How Hadoop Works 3. Hadoop Installation 4. How to start Hadoop in UNIX 5. HDFS Commands 6. Reading and Writing the data from Data-Node 7. Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

1 comments:

Popular Posts

1.

Introduction on Hadoop

2.

How Hadoop Works

3.

Hadoop Installation

4.

How to start Hadoop in UNIX

5.

HDFS Commands

6.

Reading and Writing the data from Data-Node

7.

Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode