Saturday, December 27, 2014

Apache Flink setup on ubuntu

Apache Flink setup on ubuntu
    Apache Flink

  • Compines feature from RDBMS ( query optimization capabilities)
    and MapReduce (scalability)
  • Write like a programming language, execute like a database
  • Like Spark, Flink execution engine that aggressively uses
    in-memory execution, but very gracefully degrades to
    disk-based execution when memory is not enough
  • Flink support filesystems : HDFS, HBase, Local FS, S3, JDBC.
  • Run on Local, Cluster and YARN
In this blog will see how to Setup Apache Flink on local mode,
once it's done will Execute / Run Flink job on the files which is stored in HDFS.

#Download the latest Flink and un-tar the file.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#start Flink on local mode
bdalab@bdalabsys:flink-0.8/$ ./bin/start-local.sh
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
#JobManager web UI will started by default on port 8081 Now we have everything up & running. will try to Run job.
as we all are aware a familier WordCount example in distributed
computing, lets begin with WordCount in Flink

#*-WordCount.jar file available under $FLINK_HOME/examples
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar /home/ipPath /home/flinkop
Above command, will run on file from local and store the result back to
local file system.

#If we want to process the same in HDFS
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar hdfs://localhost:9000/ip/tvvote hdfs://localhost:9000/op/
make sure HDFS daemons are up&running . else will get an error.
#bin/flink has 4 major Action.
  • run #runs a program
  • info #displays information about a program.
  • list #lists running and finished programs. -r & -s
  • cancel #cancels a running program. -i
#Display the running JobID by
bdalab@bdalabsys:flink-0.8/$bin/flink list -r -s


In Next blog will explain you the Setup Flink on Cluster mode