Thursday, November 27, 2014

Upgrade Hadoop with Latest version - Simple steps

Upgrade Hadoop Namenode with Latest version - Simple steps

Here i've listed few simple steps to upgrade Hadoop NameNode with out loss of exsiting Data in the cluster.

It's advisable to take backup of Hadoop metadata placed under : OR dir




3) Download and configure the latest version of Hadoop

4) cd $HADOOP_PREFIX/etc/hadoop
    in hdfs-site.xml ,
       change the and (if in case of pseudo node) to point to the old version of Hadoop path

5) ./sbin/ start namenode -upgrade

6) you will see following message in Web UI namenodeIP:50070 "Upgrade in progress. Not yet finalized." and SafeMode is ON

7) ./bin/hdfs dfsadmin -finalizeUpgrade

8) investigate the NameNode log, which should contains this information,
Upgradepgrade of local storage directories.
   old LV = -57; old CTime = 0.
   new LV = -57; new CTime = 1417064332016

9) safeMode will go off automatically, once you complete all these..

10) start the DFS
    ./sbin/ --config $HADOOP_PREFIX/etc/hadoop

11) start the Yarn
    ./sbin/ --config $HADOOP_PREFIX/etc/hadoop

Friday, October 17, 2014

Error and Solaution : Detailed step by step instruction on Spark over Yarn - Part 2

Excpetion Apache SPARK deployment
This is continue post, find Spark issues part 1 here

I have Hadoop cluster setup, decided to Deploy Apache Spark over Yarn.
for test case I have tried different option to summit Saprk job.
Here I have discussed few Exception / issues during
Spark deployment on Yarn.

Error 1)

:19: error: value saveAsTextFile is not a member of Array[(String, Int)] arr.saveAsTextFile("hdfs://localhost:9000/sparkhadoop/sp1")

Step to reproduce

val file = sc.textFile("hdfs://master:9000/sparkdata/file2.txt")

val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)

val arr = counts.collect()



Error caused on the bolted line above. Its due to storing the array value to the HDFS. In scala for Spark everything should be in RDD (Resilient Distributed datasets). so that scala variable can use Spark realated objects / methos. in this case just convert array into RDD ( replace bolded line by )

Error 2)

when I run the above wordcount example, I got this error too,
WARN TaskSetManager: Lost task 1.1 in stage 5.0 (TID 47, boss): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1474416393- file=sparkdata/file2.txt


I was geting data from Hadoop HDFS filesystems, my Datanode was down. i just started datanode alone by

root@boss:/opt/hadoop-2.2.0# ./sbin/ start datanode

Error 3)
My nodemanager keep on goes off. i tried many time to start up by

root@solaiv[hadoop-2.5.1]# ./sbin/ start nodemanager

FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager java.lang.NoClassDefFoundError: org/apache/hadoop/http/HttpServer2$Builder

     I checked the hadoop classpath
root@boss:/opt/hadoop-2.5.1# ./bin/hadoop classpath
Few Jar file were still refering to old version of Hadoop i.e hadoop-2.2.0. corrected by
changing Latest hadoop-2.5.1 version to HADOOP_HOME.

Related posts

Few more issues Apache Spark on Yarn