Saturday, December 27, 2014

Apache Flink setup on ubuntu

Apache Flink setup on ubuntu
    Apache Flink

  • Compines feature from RDBMS ( query optimization capabilities)
    and MapReduce (scalability)
  • Write like a programming language, execute like a database
  • Like Spark, Flink execution engine that aggressively uses
    in-memory execution, but very gracefully degrades to
    disk-based execution when memory is not enough
  • Flink support filesystems : HDFS, HBase, Local FS, S3, JDBC.
  • Run on Local, Cluster and YARN
In this blog will see how to Setup Apache Flink on local mode,
once it's done will Execute / Run Flink job on the files which is stored in HDFS.

#Download the latest Flink and un-tar the file.
bdalab@bdalabsys:/$ tar -xvzf flink-0.8-incubating-SNAPSHOT-bin-hadoop2.tgz
#rename the folder
bdalab@bdalabsys:/$ mv flink-0.8-incubating-SNAPSHOT/ flink-0.8
#move the working dir into flink_home
bdalab@bdalabsys:/$ cd flink-0.8
#start Flink on local mode
bdalab@bdalabsys:flink-0.8/$ ./bin/
#JobManager will started by above command. check the status by
bdalab@bdalabsys:flink-0.8/$ jps
6740 Jps
6725 JobManager
#JobManager web UI will started by default on port 8081 Now we have everything up & running. will try to Run job.
as we all are aware a familier WordCount example in distributed
computing, lets begin with WordCount in Flink

#*-WordCount.jar file available under $FLINK_HOME/examples
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar /home/ipPath /home/flinkop
Above command, will run on file from local and store the result back to
local file system.

#If we want to process the same in HDFS
bdalab@bdalabsys:flink-0.8/$ bin/flink run examples/flink-java-examples-0.8-incubating-SNAPSHOT-WordCount.jar hdfs://localhost:9000/ip/tvvote hdfs://localhost:9000/op/
make sure HDFS daemons are up&running . else will get an error.
#bin/flink has 4 major Action.
  • run #runs a program
  • info #displays information about a program.
  • list #lists running and finished programs. -r & -s
  • cancel #cancels a running program. -i
#Display the running JobID by
bdalab@bdalabsys:flink-0.8/$bin/flink list -r -s

In Next blog will explain you the Setup Flink on Cluster mode

Tuesday, December 16, 2014

Simple way to configuring mysql or Postgresql RDBMS as Hive metastore

simple way to configuring mysql or Postgresql RDBMS as Hive metastore

Hive will store the metadata information (i.e like RDBMS will stores the table
and column information) out of HDFS and it will process the data available in HDFS.

By default Hive store its metastore into Derby a lightweight database.
which will serve single instance at a time. If you try to start mutltiple instance of Hive, you will get error like
"Another instance of Derby may have already booted the database".

In this will see how we can configure other RDBMS (MySQL & PostgreSQL) as Hive metastore.

Create / rename hive-default.xml.template TO hive-site.xml under $HIVE_HOME/conf
hadoop@solai# vim.tiny $HIVE_HOME/conf/hive-default.xml

change the value of the following property
















download and plcae the "mysql-connector-java-5.x.xx-bin.jar" to the $HIVE_HOME/lib
hadoop@solai# mv /home/hadoop/Downloads/mysql-connector-java-5.1.31.tar.gz $HIVE_HOME/lib
In Mysql create database "hivedb" and load the hive schema to the database "hivedb"
mysql> create database hivedb;
mysql> use hivedb;

## following will create hive schema in mysql database.
mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-0.12.0.mysql.sql

its important to restrict user to alter / delete hivedb.
mysql> CREATE USER 'mysqlroot'@'hivedb' IDENTIFIED BY 'hive@123';

mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'mysqlroot'@'hivedb';

mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON hivedb.* TO 'mysqlroot'@'localhost';


mysql> quit;
Enter Hive CLI, for create table

hive> create table testHiveMysql(uname string, uplace string);
enter into mysql to check the schema information created in hive environment. following lines will return the table and column information.
mysql> select * from TBLS;
mysql> select * from COLUMNS_V2;
mysql> show tables;
show tables, will return all the tables pertaining to the Hive schema