Wednesday, June 7, 2017

Install , Configure and Use Clush



Clush is an open source tool that allows you to execute commands in parallel across the nodes in the cluster. This blog describes how to install clush, configure and use clush to run commands on multiple nodes in parallel.

The clush utility need to be only installed on one node, usually the primary node (10.10.72.200) in the cluster where we will run commands in parallel and gather stats .


Step 1: Install clustershell package
[root@VM200 ~]# yum --enablerepo=epel install clustershell

To get all cluster nodes and services on mapr cluster.

[root@VM200 ~]# maprcli node list -columns svc
service                                                                       hostname  ip
fileserver,historyserver,webserver,nodemanager,hoststats                      VM200     10.10.72.200
fileserver,hivemeta,webserver,nodemanager,hs2,hoststats                       VM201     10.10.72.201
nodemanager,spark-historyserver,cldb,fileserver,hoststats,hue                 VM202     10.10.72.202
tasktracker,nodemanager,cldb,fileserver,resourcemanager,hoststats,jobtracker  VM203     10.10.72.203


Step 2: Create a "groups" file under "/etc/clustershell/" and add all cluster nodes.
[root@VM200 ~]# vi /etc/clustershell/groups
all:10.10.72.200,10.10.72.201,10.10.72.202,10.10.72.203
[root@VM200 ~]#



[root@VM204 ~]# clush -a date
10.10.72.200: Host key verification failed.
clush: 10.10.72.200: exited with exit code 255
10.10.72.201: Host key verification failed.
clush: 10.10.72.201: exited with exit code 255
10.10.72.202: Host key verification failed.
clush: 10.10.72.202: exited with exit code 255
10.10.72.203: Host key verification failed.
clush: 10.10.72.203: exited with exit code 255

To avoid this "Host key verification failed." message , follow below host key verification steps.
########## Gnerate the key dont give any password on prompt

# ssh-keygen 


Example:

[root@VM204 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
4a:a0:74:96:51:e2:fa:4f:9d:4d:80:20:1a:96:11:5a root@VM204.mapr.com
The key's randomart image is:
+--[ RSA 2048]----+
....
...


-----------------------------------------------------------------
#### Copy the key to all the nodes for activating password less ssh

# ssh-copy-id 10.10.72.200
# ssh-copy-id 10.10.72.201
# ssh-copy-id 10.10.72.202
# ssh-copy-id 10.10.72.203
# ssh-copy-id 10.10.72.204

Example:

[root@VM200 ~]# ssh-copy-id 10.10.72.200
The authenticity of host '10.10.72.200 (10.10.72.200)' can't be established.
RSA key fingerprint is 20:d4:f6:6e:d5:4b:af:80:bc:21:2b:f6:21:51:19:65.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.10.72.200' (RSA) to the list of known hosts.
root@10.10.72.200's password:
Now try logging into the machine, with "ssh '10.10.72.200'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@VM200 ~]# ssh-copy-id 10.10.72.201
[root@VM200 ~]# ssh-copy-id 10.10.72.202
[root@VM200 ~]# ssh-copy-id 10.10.72.203



[root@VM200 ~]# clush -a date
10.10.72.200: Wed Jun  7 17:16:58 IST 2017
10.10.72.201: Wed Jun  7 17:16:43 IST 2017
10.10.72.203: Wed Jun  7 17:18:07 IST 2017
10.10.72.202: Wed Jun  7 17:16:59 IST 2017
[root@VM200 ~]#

Tuesday, February 7, 2017

Hue-MariaDB intergation in CentOS-7.x



Issue:
Whenever you face an issue while launching HUE web UI as specified below in CentOS7/RHEL7 in MapR-.2 platform, please follow below steps.
raise errorclass, errorvalue
ProgrammingError: (1146, "Table 'hue.auth_user' doesn't exist")

To over come this issue, please use below steps:

Environment:
[root@VM206 mapr]# rpm -qa| grep mapr
mapr-core-internal-5.2
mapr-core-5.2
mapr-hue-base-3.9.0

Steps:
Step 1:
Run the following commands to install MariaDB and the Redhat 6 compatibility library:
#yum install mariadb
#ver=$(rpm -qa mariadb|cut -d- -f2)
#rpm -ivh --nodeps http://yum.mariadb.org/$ver/rhel7-amd64/rpms/MariaDB-$ver-centos7-x86_64-compat.rpm

Logging into  MariaDB
[root@VM206 mapr]# mysql -u root -p
Enter password:<pwd> (= <puli> )
MariaDB [(none)]> create database hue;
MariaDB [(none)]> grant all on hue.* to 'hue'@10.10.72.206 identified by 'puli';

MariaDB [(none)]> flush privileges;
MariaDB [(none)]> use hue;
MariaDB [hue]> show tables;
Empty set (0.00 sec)

step 2:
Run the following command to create a symlink for the Cyrus SASL library:

#ln -s /lib64/libsasl2.so.3.0.0 /lib64/libsasl2.so.2

Step 3:
Run the following command to reconfigure Hue:
#bash -c "source /opt/mapr/hue/hue-3.9.0/build/env/bin/activate;
      /opt/mapr/hue/hue-3.9.0/build/env/bin/hue syncdb --noinput;
      /opt/mapr/hue/hue-3.9.0/build/env/bin/hue migrate"
    
Step 4:
Run the following command to restart Hue:
      # maprcli node services -name hue -action restart -nodes `hostname`
      
Verification:

Log in to Hue host:
http://10.10.72.206:8888/

You will be able to see launching the HUE web UI without any issues.

Check in MariaDB:

[root@VM206 mapr]# mysql -u root -p
Enter password:<pwd>

MariaDB [(none)]> use hue;

MariaDB [hue]> show tables;
+--------------------------------+
| Tables_in_hue                  |
 auth_group                     |
| auth_group_permissions         |
| auth_permission                |
| auth_user                      |
| auth_user_groups      
.......
......
(You will see here lot of tables . Around  74 tables.

Tuesday, November 15, 2016

Loading Data into Hbase from HDFS(MapRFS-mfs)


Step 1: HDFS file data

[root@localhost mapr]# hadoop fs -cat //user/mapr/sample.txt
100,sachin,Mum,India,Cricket
200,Samantha,Chennai,india,Actress
300,Modi,Ahd,Ind,politician

Step 2: Creating table in HBase with rowkey

hbase(main):006:0> create 'blkimport','cf'
0 row(s) in 1.2990 seconds

=> Hbase::Table - blkimport
hbase(main):007:0> list
blkimport


hbase(main):010:0> scan 'blkimport'
ROW                                   COLUMN+CELL
0 row(s) in 0.0180 seconds

Step 3: importing data from hdfs to HBase using ImportTsv

[mapr@localhost hbase-1.1.1]$ /opt/mapr/hbase/hbase-1.1.1/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns='HBASE_ROW_KEY,cf:pname,cf:loc,cf:country,cf:prefession'   blkimport /user/mapr/sample.txt

Here /user/mapr/sample.txt--> HDFS file
         blkimport --------------> HBase table
         cf------------------------> ColumnFamily
        HBASE_ROW_KEY->  RowKey
   pname,loc,country,prefession--> Column Names

Step 4: Scan table HBase

hbase(main):011:0> scan 'blkimport'
ROW                                   COLUMN+CELL
 100                                  column=cf:country, timestamp=1477509673574, value=India
 100                                  column=cf:loc, timestamp=1477509673574, value=Mum
 100                                  column=cf:pname, timestamp=1477509673574, value=sachin
 100                                  column=cf:prefession, timestamp=1477509673574, value=Cricket
 200                                  column=cf:country, timestamp=1477509673574, value=india
 200                                  column=cf:loc, timestamp=1477509673574, value=Chennai
 200                                  column=cf:pname, timestamp=1477509673574, value=Samantha
 200                                  column=cf:prefession, timestamp=1477509673574, value=Actress
 300                                  column=cf:country, timestamp=1477509673574, value=Ind
 300                                  column=cf:loc, timestamp=1477509673574, value=Ahd
 300                                  column=cf:pname, timestamp=1477509673574, value=Modi
 300                                  column=cf:prefession, timestamp=1477509673574, value=politician
3 row(s) in 0.0570 seconds



Spark-Hive integration



1.Connecting using HiveThriftServer(Port-10000)
2.Connecting using SparkThriftServer(Port-10001)

Enviroment:

# rpm -qa | grep mapr
mapr-hivemetastore-1.2.201609261225-1.noarch
mapr-spark-historyserver-1.6.1.201609271200-1.noarch
mapr-hiveserver2-1.2.201609261225-1.noarch
mapr-spark-1.6.1.201609271200-1.noarch
mapr-mapreduce2-2.7.0.37549.GA-1.x86_64
mapr-hive-1.2.201609261225-1.noarch

Configuration steps:

Add below properties to "spark-defaults.conf" file.
# vi /opt/mapr/spark/spark-1.6.1/conf/spark-defaults.conf
<!--SPARK-SQL and HIve Integration-->
spark.yarn.dist.files=/opt/mapr/hive/hive-1.2/conf/hive-site.xml,/opt/mapr/hive/hive-1.2/lib/datanucleus-api-jdo-4.2.1.jar,/opt/mapr/hive/hive-1.2/lib/datanucleus-core-4.1.6.jar,/opt/mapr/hive/hive-1.2/lib/datanucleus-rdbms-4.1.7.jar
spark.sql.hive.metastore.version=1.2.0
spark.sql.hive.metastore.jars=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hive/hive-1.2/lib/accumulo-core-1.6.0.jar:/opt/mapr/hive/hive-1.2/lib/hive-contrib-1.2.0-mapr-1609.jar:/opt/mapr/hive/hive-1.2/lib/*




And add this below export property to spark-env.sh file
#vi /opt/mapr/spark/spark-1.6.1/conf/spark-env.sh
export HIVE_SERVER2_THRIFT_PORT=10001

Starting SparkThriftServer with port 10001 using below command:

[root@VM204 spark-1.6.1]# /opt/mapr/spark/spark-1.6.1/sbin/start-thriftserver.sh  start --hive-conf hive.server2.thrift.port=10001

(OR)

[root@VM204 spark-1.6.1]# /opt/mapr/spark/spark-1.6.1/sbin/start-thriftserver.sh  start  hive.server2.thrift.port=10001

Check the port is running or not.?

[root@VM204 spark-1.6.1]# netstat -plant | grep 10001
tcp        0      0 :::10001                    :::*                        LISTEN      13588/java

Checking db's ,table's and data from Hive CLI:

[mapr@VM204 spark-1.6.1]$ hive

Logging initialized using configuration in file:/opt/mapr/hive/hive-1.2/conf/hive-log4j.properties
hive> show databases;
OK
default
dummy
Time taken: 1.57 seconds, Fetched: 2 row(s)

hive> use default;
OK
Time taken: 0.05 seconds

hive> show tables;
OK
b
b_ext
demotest1
test
Time taken: 0.049 seconds, Fetched: 4 row(s)

hive> select *from b;
OK
10      kuamr
20      nitin
30      shishir
Time taken: 1.138 seconds, Fetched: 3 row(s)
hive>

Check the same data from Spark beeline:

1.Connecting using HiveThriftServer(Port-10000)

[root@VM204 spark-1.6.1]# pwd
/opt/mapr/spark/spark-1.6.1

[mapr@VM204 spark-1.6.1]$ ./bin/beeline
Beeline version 1.6.1-mapr-1609 by Apache Hive

beeline> !connect jdbc:hive2://10.10.72.204:10000
Connecting to jdbc:hive2://10.10.72.204:10000
Enter username for jdbc:hive2://10.10.72.204:10000: mapr
Enter password for jdbc:hive2://10.10.72.204:10000: ****   <mapr>
Connected to: Apache Hive (version 1.2.0-mapr-1609)
Driver: Spark Project Core (version 1.6.1-mapr-1609)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://10.10.72.204:10000> show databases;

+----------------+--+
| database_name  |
+----------------+--+
| default        |
| dummy          |
+----------------+--+

2.Connecting using SparkThriftServer(Port-10001)
[mapr@VM204 spark-1.6.1]$ ./bin/beeline
Beeline version 1.6.1-mapr-1609 by Apache Hive

beeline> !connect jdbc:hive2://10.10.72.204:10001
Connecting to jdbc:hive2://10.10.72.204:10001
Enter username for jdbc:hive2://10.10.72.204:10001: mapr
Enter password for jdbc:hive2://10.10.72.204:10001: ****    <mapr>
Connected to: Spark SQL (version 1.6.1)
Driver: Spark Project Core (version 1.6.1-mapr-1609)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://10.10.72.204:10001> show databases;
+----------+--+
|  result  |
+----------+--+
| default  |
| dummy    |
+----------+--+
2 rows selected (4.289 seconds)
0: jdbc:hive2://10.10.72.204:10001>

0: jdbc:hive2://10.10.72.204:10001> use default;
+---------+--+
| result  |
+---------+--+
+---------+--+
No rows selected (0.134 seconds)

0: jdbc:hive2://10.10.72.204:10001> show tables;
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
| b          | false        |
| b_ext      | false        |
| test       | false        |
+------------+--------------+--+
3 rows selected (0.077 seconds)

0: jdbc:hive2://10.10.72.204:10001> select *from b;
+-----+----------+--+
|  a  |    b     |
+-----+----------+--+
| 10  | kuamr    |
| 20  | nitin    |
| 30  | shishir  |
+-----+----------+--+
3 rows selected (1.916 seconds)
0: jdbc:hive2://10.10.72.204:10001>



Loading data into MapR tables(MapRDB) using "importtsv"


Logging into HBase

[mapr@6VM201 root]$ hbase shell

Creating MapRDB table saying "/user/mapr/blk"
hbase(main):018:0>  create '/user/mapr/blk','cf'
0 row(s) in 0.0390 seconds

=> Hbase::Table - /user/mapr/blk

File "/user/test.txt"  need to be load into MapRDB table
[root@6VM201 ~]# hadoop fs -cat /user/test.txt
761377,kuamr,hyd,india
761344,nitin,mum,india

Command to load into MapRDB table using " importtsv"

[mapr@6VM201 hbase-1.1.1]$ /opt/mapr/hbase/hbase-1.1.1/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:name,cf:loc,cf:country -Dimporttsv.separator=',' -Dimporttsv.bulk.output=/user/mapr/nitintab3 /user/mapr/blk /user/test.txt


Here /user/mapr/blk --> MapRDB table
        /user/test.txt------> Source file to be loading into MapRDB table
      /user/mapr/nitintab3->final Status to be stored in this file

[mapr@6VM201 root]$ hbase shell

Check the data in MapRDB table
hbase(main):019:0> scan '/user/mapr/blk'
ROW                                               COLUMN+CELL
 761344                                           column=cf:country, timestamp=1475197667463, value=india
 761344                                           column=cf:loc, timestamp=1475197667463, value=mum
 761344                                           column=cf:name, timestamp=1475197667463, value=nitin
 761377                                           column=cf:country, timestamp=1475197667463, value=india
 761377                                           column=cf:loc, timestamp=1475197667463, value=hyd
 761377                                           column=cf:name, timestamp=1475197667463, value=kuamr
2 row(s) in 0.0510 seconds

Monday, November 14, 2016

MapR-Sqoop2-ORACLE data import steps


Steps:
1.Creating link for maprfs
2.Creating link for oracle
3.creating a job
4.Starting a job
5.Status of job.

Environment
mapr-sqoop2-server-2.0.0.201607271151-1.noarch
mapr-sqoop2-client-2.0.0.201607271151-1.noarch

[mapr@VM207 sqoop-2.0.0]$ ./bin/sqoop2-shell

sqoop:000> show link
0 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
+----+------+--------------+----------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------+--------------+----------------+---------+
+----+------+--------------+----------------+---------+


sqoop:000> show connector
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| 1 | kite-connector | 1.99.6-mapr-1607 | org.apache.sqoop.connector.kite.KiteConnector |
FROM/TO |
| 2 | kafka-connector | 1.99.6-mapr-1607 | org.apache.sqoop.connector.kafka.KafkaConnector
| TO |
| 3 | hdfs-connector | 1.99.6-mapr-1607 | org.apache.sqoop.connector.hdfs.HdfsConnector |
FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6-mapr-1607 |
org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
sqoop:000>


1.Creating link for maprfs
sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: maprfs
Link configuration
HDFS URI: maprfs://10.10.72.207:7222
Hadoop conf directory: /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop
New link was successfully created with validation status OK and persistent id 2


2.Creating link for ORACLE JDBC conection
sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: oraclenew
Link configuration
JDBC Driver Class: oracle.jdbc.driver.OracleDriver
JDBC Connection String: jdbc:oracle:thin:@10.10.70.142:1521:VIJAYDB
Username: scott
Password: *****
JDBC Connection Properties:
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 5
sqoop:000> show link
+----+-----------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-----------+--------------+------------------------+---------+
| 2 | maprfs | 3 | hdfs-connector | true |
| 5 | oraclenew | 4 | generic-jdbc-connector | true |
+----+-----------+--------------+------------------------+---------+


3.creating a job
sqoop:000> create job --from 5 --to 2
Creating job for links with from id 5 and to id 2
Please fill following values to create new job object
Name: newjob
From database configuration
Schema name:
Table name: TEST_TSS_ORDER_HEADERS_F_V
Table SQL statement:
Table column names:
Partition column name: ORDER_STATUS_LKP_KEY
Null value allowed for the partition column: false
Boundary query:
Incremental read
Check column:
Last value:
To HDFS configuration
Override null value:
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: /user/mapr/ora4
Append mode:
Throttling resources
Extractors:
Loaders:
New job was successfully created with validation status OK and persistent id 11
sqoop:000> show job
+----+-----------+----------------+--------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+-----------+----------------+--------------+---------+
| 11 | newjob | 4 | 3 | true |


4.Starting a job
sqoop:000> start job -j 11
Submission details
Job ID: 11
Server URL: http://VM207:12000/sqoop/
Created by: mapr
Creation date: 2016-10-27 12:45:48 IST
Lastly updated by: mapr
External ID: job_1476676093162_0026
http://VM203:8088/proxy/application_1476676093162_0026/
Source Connector schema: Schema{name=TEST_TSS_ORDER_HEADERS_F_V,columns=[
Decimal{name=BL_ORDER_KEY,nullable=true,type=DECIMAL,precision=0,scale=-127},
Decimal{name=INCIDENT_ID,nullable=true,type=DECIMAL,precision=0,scale=-127},
Decimal{name=HEADER_ID,nullable=true,type=DECIMAL,precision=0,scale=-127},
Decimal{name=RMA_NUMBER,nullable=true,type=DECIMAL,precision=0,scale=-127},
Text{name=CUST_SERIAL_NUMBER,nullable=true,type=TEXT,charSize=null},
Date{name=ORDER_CREATION_DATE,nullable=true,type=DATE_TIME,hasFraction=true,hasTi
mezone=false},
Date{name=ORDER_CLOSE_DATE,nullable=true,type=DATE_TIME,hasFraction=true,hasTimez
one=false},
Decimal{name=ORDER_STATUS_LKP_KEY,nullable=true,type=DECIMAL,precision=0,scale=-
127}]}
2016-10-27 12:45:48 IST: BOOTING - Progress is not available


5.Status for Running job:
sqoop:000> status job -j 11
Submission details
Job ID: 11
Server URL: http://VM207:12000/sqoop/
Created by: mapr
Creation date: 2016-10-27 12:45:48 IST
Lastly updated by: mapr
External ID: job_1476676093162_0026
http://VM203:8088/proxy/application_1476676093162_0026/
2016-10-27 12:46:26 IST: RUNNING - 15.00 %
sqoop:000> status job -j 11
Submission details
Job ID: 11
Server URL: http://VM207:12000/sqoop/
Created by: mapr
Creation date: 2016-10-27 12:45:48 IST
Lastly updated by: mapr
External ID: job_1476676093162_0026
http://VM203:8088/proxy/application_1476676093162_0026/
2016-10-27 12:47:36 IST: RUNNING - 45.00 %


Check the loaded file in HDFS.
[root@VM204 hadoop]# hadoop fs -cat /user/mapr/ora4/0119d883-fd76-4431-af85-
cffa77f9c9e5.txt
3,3,3,3,'cust1','2014-07-03 06:13:00.000','2014-07-03 06:13:00.000',3

For reference:
https://github.com/splicemachine/mapr-hbase/tree/0.94.17-mapr
http://bruzah.blogspot.in/2012/02/java-google-protobufs-to-rpc-some-data.html
https://cwiki.apache.org/confluence/download/attachments/27362072/system_architecture.png?version=1&modificationDate=1414560669000&api=v2
http://hortonworks.com/blog/logparsing-with-cascading/
https://docs.google.com/document/d/12VBKeMgXKhWm0qIcRlrxpSJ-1GPUq463KlaWKrDB3qU/edit#heading=h.ljwprjkuevdk

mapr-SQOOP2-MySQL Data import


Enviroment:
mapr-sqoop2-server-2.0.0.201607271151-1.noarch
mapr-sqoop2-client-2.0.0.201607271151-1.noarch


Steps
1.Creating link for maprfs
2.Creating link for MySQL & maprfs
3.creating a job
4.Starting a job
5.Status of a  job.


Step I:

Check available links

sqoop:000> show link
+----+------+--------------+----------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------+--------------+----------------+---------+
+----+------+--------------+----------------+---------+

Check available Connectors.

sqoop:000> show connector
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| 1 | kite-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kite.KiteConnector |
FROM/TO |
| 2 | kafka-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kafka.KafkaConnector
| TO |
| 3 | hdfs-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.hdfs.HdfsConnector |
FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6-mapr-1507 |
org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+


Step II:
Create a link for RDBMS( from which DB we would like to import Data
NOTE: Provide the Connecotr ID for Name :generic-jdbc-connector . For Ex here Id is 4 for -c
arguement.)

1.Creating link for MySQL:

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: <mysql>
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://<DB HostName>/<Database>
Username: <sqoop>
Password: <*****>
JDBC Connection Properties:<Optional>
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 2

sqoop:000> show link
+----+-------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
+----+-------+--------------+------------------------+---------+


Step 3:
Create a link for import location i.e. MFS location

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: maprfs
Link configuration
HDFS URI:maprfs://<CLDB HostName>:7222
Hadoop conf directory: /opt/mapr/hadoop/hadoop-0.20.2/conf
New link was successfully created with validation status OK and persistent id 4

sqoop:000> show link
+----+--------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+--------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
| 4 | maprfs | 3 | hdfs-connector | true |
+----+--------+--------------+------------------------+---------+


Step 4:
Creating a JOB

sqoop:000> create job --from 2 --to 4
Creating job for links with from id 2 and to id 4
Please fill following values to create new job object
Name: tetsjob
From database configuration
Schema name: mysql
Table name: <TableName>
Table SQL statement:<Optional>
Table column names:<Optional>
Partition column name: <Provide a ColumnNamefor Partitioning>
Null value allowed for the partition column: true
Boundary query:<Optional>
Incremental read
Check column:<Optional>
Last value:<Optional>
To HDFS configuration
Override null value:<Optional>
Null value:<Optional>
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: </MFS LOCATION NAME>
Append mode:<Optional>
Throttling resources
Extractors:<Optional>
Loaders:<Optional>
New job was successfully created with validation status OK and persistent id 12
sqoop:000> show job
sqoop:000> start job -j <Job Id>
Ex: start job -j 12
Submission details
Job ID: 12
Server URL:
Created by: mapr
Creation date:
Lastly updated by: mapr
External ID: job_<ID>
http://<Host>:8088/proxy/application_1461206632562_0005/
Source Connector schema: Schema{TABLE SCHEMA WILL BE DISPLAYED HERE}

5.Starting a job
sqoop:000> start job -j 11
Submission details
Job ID: 11
Server URL: http://VM207:12000/sqoop/
Created by: mapr
Creation date: 2016-10-27 12:45:48 IST
Lastly updated by: mapr
External ID: job_1476676093162_0026
        http://VM203:8088/proxy/application_1476676093162_0026/
       
6.Check the status of running job:
sqoop:000> status job -j 12

For Reference:
https://github.com/splicemachine/mapr-hbase/tree/0.94.17-mapr
http://bruzah.blogspot.in/2012/02/java-google-protobufs-to-rpc-some-data.html
https://cwiki.apache.org/confluence/download/attachments/27362072/system_architecture.png?version=1&modificationDate=1414560669000&api=v2
http://hortonworks.com/blog/logparsing-with-cascading/
https://docs.google.com/document/d/12VBKeMgXKhWm0qIcRlrxpSJ-1GPUq463KlaWKrDB3qU/edit#heading=h.ljwprjkuevdk