Wednesday, November 13, 2013

Setup Multi Node Hadoop 2.0 cluster configuration


Installing Hadoop 2.x.x – multi-node cluster configuration
Environment
OS : Debin / BOSS / Ubuntu
Hadoop : Hadoop 2.2.0

find here Hadoop 1.x.x comman mistake while installation

find here Hadoop-2.2.0 Single-node cluster setup. Multi-node hadoop cluster setup can be done be either by 1) single-node cluster setup by all the machine and changes the Hadoop configuration files (or) 2) just follow the below step (expect 5.b, 5.c, which is only for master) for both Master and all the slave node.

  1. Prerequisites: ( for both Master and all the slave)
    1. Java 6 or above need to be installed
      Ensure that JDK had been already installed in your machine. Otherwise install JDK.
      Download and extract the jdk1.* and extartct the same.
      root@solaiv[~]#vi /etc/profile
      Add : JAVA_HOME= /usr/local/jdk1.6.0_18
      Append : PATH = “...:$JAVA_HOME/bin”
      ADD : export JAVA_HOME
      Run /etc/profile for reflecting the changes and check the Java version
      root@solaiv[~]#. /etc/profile (or) source /etc/profile
      root@solaiv[~]# java --version

    1. Create dedicated user/group for hadoop. (optional)
      Create user, create group and add the user to the group.
      root@solaiv[~]#createuser hduser
      root@solaiv[~]#addgroup hadoop
      root@solaiv[~]#adduser --ingroup hadoop hduser
      root@solaiv[~]#su hduser

    1. Password less SSH configuration for localhost, later will do for salve (optional, if we didn't do this then have to provide password for each process to start by ./start-*.sh)
      generate an SSH key for the hduser user. Then Enable password less SSH access to your local machine with this newly created key.
      hduser@solaiv[~]#ssh-keygen -t rsa -P ""
      hduser@solaiv[~]#cat /home/hduser/.ssh/id_rsa.pub >> /home/hduser/.ssh/authorized_keys
      hduser@solaiv[~]#ssh localhost

  1. Steps to install Hadoop 2.x.x ( for both Master and all the slave)
    1. Download Hadoop 2.x.x
    2. Extract the hadoop-2.2.0 move to /opt/hadoop-2.2.0
    3. Add the follwing lines into .bashrc file
      hduser@solaiv[~]#cd ~
      hduser@solaiv[~]#vi .bashrc

copy and paste following line at end of the file
      #copy start here
      export HADOOP_HOME=/opt/hadoop-2.2.0
      export HADOOP_MAPRED_HOME=$HADOOP_HOME 
      export HADOOP_COMMON_HOME=$HADOOP_HOME 
      export HADOOP_HDFS_HOME=$HADOOP_HOME 
      export YARN_HOME=$HADOOP_HOME 
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
      export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
      #copy end here


  1. Modify hadoop environment file ( for both Master and all the slave)
    1. Add JAVA_HOME to libexec/hadoop-config.sh at beginning of the file
      hduser@solaiv[~]#vi /opt/hadoop-2.2.0/libexec/hadoop-config.sh
      ….
      export JAVA_HOME=/usr/local/jdk1.6.0_18
      ….
    2. Add JAVA_HOME to hadoop/hadoop-env.sh at beginning of the file
      hduser@solaiv[~]#vi /opt/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
      ….
      export JAVA_HOME=/usr/local/jdk1.6.0_18
      ….
    3. Check Hadoop installation
      hduser@solaiv[~]#cd /opt/hadoop-2.2.0/bin
      hduser@solaiv[bin]#./hadoop version
      Hadoop 2.2.0
      ..
      At this point Hadoop installed in your node.

  1. Create folder for tmp ( for both Master and all the slave)
      hduser@solaiv[~]#mkdir -p $HADOOP_HOME/tmp

  1. Configuration : Multi-node setup
    1. Add IP address of Master and all Slaves to /etc/hosts ( for both Master and all the slave node)
      Add the association between the hostnames and the IP address for the master and the slaves on all the nodes in the /etc/hosts. Make sure that the all the nodes in the cluster are able to ping to each other.
      hduser@boss:/opt/hadoop-2.2.0/bin#vi /etc/hosts
      10.184.39.67 master
      10.184.36.134 slave
      in my case only one slave, if u have more no.of slave node, name it like slave1, slave2 etc..
    2. Password less ssh from master to slave (Optional, only at Master node)
      hduser@boss:[~]#ssh-keygen -t rsa -P ""
      hduser@boss:[~]#ssh-copy-id -i /home/hduser/.ssh/id_dsa.pub hduser@slave
      root@boss[bin]#ssh slave
    [Note : If you skip this step, then have to provide password for all slave when Master start the process by ./start-*.sh. If you have configured more no.of slave as mentioned in /etc/hosts, repeet the 2nd line of above to all the slaves by hduser@slave1, hduser@slave2 etc.. ]
    1. Add the Slave entries in $HADOOP_CONF_DIR/slaves ( only at Master node )
      Add all the slave entries in slaves file in Master node. This intimating Hadoop that these nodes for running DataNode and NodeManager. If you dont want master to act as DataNode just omit.
      hduser@boss:[~]#vi /opt/hadoop-2.2.0/etc/hadoop/slaves
       slave
      Note : in my case only one slave, if u have more no.of slave node, add all the slave hostname one in line as mentioned in /etc/hosts
  2. Hadoop Configuration ( for both Master and all the slave)
    Add the properties in following hadoop configuration file which is availabile under $HADOOP_CONF_DIR
    1. core-site.xml
      hduser@solaiv[~]#cd /opt/hadoop-2.2.0/etc/hadoop
      hduser@solaiv[hadoop]#vi core-site.xml
    #Paste following between <configuration> tag
      <property>
          <name>fs.default.name</name>
          <value>hdfs://master:9000</value>
        </property>
        <property>
          <name>hadoop.tmp.dir</name>
          <value>/opt/hadoop-2.2.0/tmp</value>
        </property>

    1. hdfs-site.xml
      hduser@solaiv[hadoop]#vi hdfs-site.xml
    #Paste following between <configuration> tag
      <property>
      <name>dfs.replication</name>
      <value>2</value>
       </property>
        <property>
      <name>dfs.permissions</name>
      <value>false</value>
      </property>
Note : Here I've only one slave and master so I put replication values as 2, If you have more slave put replication value based on that.
    1. mapred-site.xml
      hduser@solaiv[hadoop]#vi mapred-site.xml
    #Paste following between <configuration> tag
      <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
      </property>

    1. yarn-site.xml
      hduser@solaiv[hadoop]#vi yarn-site.xml
      #Paste following between <configuration> tag
      <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce.shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux- services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.resourcemanager.resource- tracker.address</name>
          <value>master:8025</value>
        </property>
        <property>
          <name>yarn.resourcemanager.scheduler.address</name>
          <value>master:8030</value>
        </property>
        <property>
          <name>yarn.resourcemanager.address</name>
          <value>master:8040</value>
        </property>

  1. Format the namenode ( only at Master node )
      hduser@boss:/opt/hadoop-2.2.0/bin#cd /opt/hadoop-2.2.0/bin
      hduser@boss:/opt/hadoop-2.2.0/bin# ./hadoop namenode -format

  1. Admintaring Hadoop - Start & Stop (Only at Master node)
    just start the process at Master slave node automatically startup.
    1. start-dfs.sh : to start namenode and datanode
    
    
      hduser@boss:[~]# cd /opt/hadoop-2.2.0/sbin
      hduser@boss:[sbin]# ./start-dfs.sh
    check at Master
      hduser@boss:[sbin]#jps
      17675 Jps
      17578 SecondaryNameNode
      17409 NameNode
    check at Salve
      hduser@boss:[sbin]#jps
      9317 Jps
      9250 DataNode

    1. start-yarn.sh : to start resourcemanager and nodemanager
      hduser@boss:[sbin]# ./start-yarn.sh
    check at Master
      hduser@boss:[sbin]#jps
      17578 SecondaryNameNode
      17917 ResourceManager
      17409 NameNode
      18153 Jps
    check at Salve
      hduser@boss:[sbin]#jps
      9317 Jps
      9250 DataNode
      9357 NodeManager

  1. Working on Hadoop multi-node environment
    1. excute this command at master
      hduser@boss:/opt/hadoop-2.2.0/bin# ./hdfs dfs -mkdir -p /user/hadoop2
      hduser@boss:/opt/hadoop-2.2.0/bin# ./hdfs dfs -put /root/Desktop/test.html /user/hadoop2
      hduser@boss:/opt/hadoop-2.2.0/bin# ./hdfs dfs -ls
      Found 1 items
      -rw-r--r-- 2 root supergroup 225 2013-11-11 20:19 /user/hadoop2/test.html
    2. check at slave node

      hduser@boss:/opt/hadoop-2.2.0/bin# ./hdfs dfs -ls user/hadoop2/
      Found 1 items
      -rw-r--r-- 2 root supergroup 225 2013-11-11 20:19 /user/hadoop2/test.html
      hduser@boss:/opt/hadoop-2.2.0/bin# /opt/hadoop-2.2.0/bin# ./hdfs dfs -cat /user/hadoop2/test.html
      test file. Welcome to Hadoop2.2.0 Installation. !!!!!!!!!!!


Installing Hadoop 2.x.x – Single-node cluster configuration

Installing Hadoop 2.x.x – Single-node cluster configuration
Environment
OS : Debin / BOSS / Ubuntu
Hadoop : Hadoop 2.2.0

find here Hadoop 1.x.x comman mistake while installation

  1. Prerequisites:
    1. Java 6 or above need to be installed
      Ensure that JDK had been already installed in your machine. Otherwise install JDK.
      Download and extract the jdk1.* and extartct the same.
      root@solaiv[~]#vi /etc/profile
      Add : JAVA_HOME= /opt/jdk1.6.0_18
      Append : PATH = “...:$JAVA_HOME/bin”
      ADD : export JAVA_HOME
      Run /etc/profile for reflecting the changes and check the Java version
      root@solaiv[~]#. /etc/profile (or) source /etc/profile
      root@solaiv[~]# java --version
    1. Create dedicated user/group for hadoop. (optional)
      Create user, create group and add the user to the group.
      root@solaiv[~]#createuser hduser
      root@solaiv[~]#addgroup hadoop
      root@solaiv[~]#adduser --ingroup hadoop hduser
      root@solaiv[~]#su hduser
    1. Password less SSH configuration for localhost, later will do for salve (optional, if we didn't do this then have to provide password for each process to start by ./start-*.sh)
      generate an SSH key for the hduser user. Then Enable password less SSH access to your local machine with this newly created key.
      hduser@solaiv[~]#ssh-keygen -t rsa -P ""
      hduser@solaiv[~]#cat /home/hduser/.ssh/id_rsa.pub >> /home/hduser/.ssh/authorized_keys
      hduser@solaiv[~]#ssh localhost
  1. Steps to install Hadoop 2.x.x
    1. Download Hadoop 2.x.x
    2. Extract the hadoop-2.2.0 move to /opt/hadoop-2.2.0
    3. Add the follwing lines into .bashrc file
      hduser@solaiv[~]#cd ~
      hduser@solaiv[~]#vi .bashrc
                   copy and paste following line at end of the file
      #copy start here
      export HADOOP_HOME=/opt/hadoop-2.2.0
      export HADOOP_MAPRED_HOME=$HADOOP_HOME 
      export HADOOP_COMMON_HOME=$HADOOP_HOME 
      export HADOOP_HDFS_HOME=$HADOOP_HOME 
      export YARN_HOME=$HADOOP_HOME 
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
      export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop 
      #copy end here
  1. Modify hadoop environment file
    1. Add JAVA_HOME to libexec/hadoop-config.sh at beginning of the file
      hduser@solaiv[~]#vi /opt/hadoop-2.2.0/libexec/hadoop-config.sh
      ….
      export JAVA_HOME=/usr/local/jdk1.6.0_18
      ….
    2. Add JAVA_HOME to hadoop/hadoop-env.sh at beginning of the file
      hduser@solaiv[~]#vi /opt/hadoop-2.2.0/etc/hadoop/hadoop-env.sh
      ….
      export JAVA_HOME=/usr/local/jdk1.6.0_18
      ….
    3. Check Hadoop installation
      hduser@solaiv[~]#cd /opt/hadoop-2.2.0/bin
      hduser@solaiv[bin]#./hadoop version
      Hadoop 2.2.0
      ..
      At this point Hadoop installed in your node.
  1. Create folder for tmp,namenode and datanode
      hduser@solaiv[~]#mkdir -p $HADOOP_HOME/tmp

  1. Hadoop Configuration
    Add the properties in following hadoop configuration file which is availabile under $HADOOP_CONF_DIR
    1. core-site.xml
      hduser@solaiv[~]#cd /opt/hadoop-2.2.0/etc/hadoop
      hduser@solaiv[hadoop]#vi core-site.xml
            #Paste following between <configuration>  tag
      <property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:9000</value>
        </property>
        <property>
          <name>hadoop.tmp.dir</name>
          <value>/opt/hadoop-2.2.0/tmp</value>
        </property>
    1. hdfs-site.xml
      hduser@solaiv[hadoop]#vi hdfs-site.xml
             #Paste following between <configuration> tag
      <property>
      <name>dfs.replication</name>
      <value>1</value>
      </property>
       <property>
      <name>dfs.permissions</name>
      <value>false</value>
      </property>
    1. mapred-site.xml
      hduser@solaiv[hadoop]#vi mapred-site.xml
            #Paste following between <configuration> tag
      <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
      </property>
    1. yarn-site.xml
      hduser@solaiv[hadoop]#vi yarn-site.xml
      #Paste following between tag
      <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce.shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux- services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.resourcemanager.resource- tracker.address</name>
          <value>localhost:8025</value>
        </property>
        <property>
          <name>yarn.resourcemanager.scheduler.address</name>
          <value>localhost:8030</value>
        </property>
        <property>
          <name>yarn.resourcemanager.address</name>
          <value>localhost:8040</value>
        </property>

  1. Format the namenode
      root@boss:/opt/hadoop-2.2.0/bin#cd /opt/hadoop-2.2.0/bin
      root@boss:/opt/hadoop-2.2.0/bin# ./hadoop namenode -format

  2. Start Hadoop services
      root@boss:/opt/hadoop-2.2.0/bin# cd /opt/hadoop-2.2.0/sbin/
      root@boss:/opt/hadoop-2.2.0/sbin# ./start-dfs.sh
      root@boss:/opt/hadoop-2.2.0/sbin# jps
      21422 Jps
      21154 DataNode
      21070 NameNode
      21322 SecondaryNameNode
      root@boss:/opt/hadoop-2.2.0/sbin# ./start-yarn.sh
      root@boss:/opt/hadoop-2.2.0/sbin# jps
      21563 NodeManager
      21888 Jps
      21154 DataNode
      21070 NameNode
      21322 SecondaryNameNode
      21475 ResourceManager