Thursday, May 26, 2016

Working with Sqoop2 import command.



Note:
Please configure PostgresSql as Sqoop2 metastore , then follow below steps.
Reference for Configuring ,please follow previous post " PostgresSql configuration for SQOOP2"

 This steps are tested in MapR-cluster-5.0,CentOS-6.6

 STEPS:
Step 1:
Check available links and Connectors.
sqoop:000> show link
+----+------+--------------+----------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------+--------------+----------------+---------+
+----+------+--------------+----------------+---------+

sqoop:000> show connector
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| 1 | kite-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kite.KiteConnector |
FROM/TO |
| 2 | kafka-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kafka.KafkaConnector
| TO |
| 3 | hdfs-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.hdfs.HdfsConnector |
FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6-mapr-1507 |
org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+


Step 2:
Create a link for RDBMS( from which DB we would like to import Data
NOTE: Provide the Connecotr ID for Name :generic-jdbc-connector . For Ex here Id is 4 for -c
arguement.)

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: <mysql>
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://<DB HostName>/<Database>
Username: <sqoop>
Password: <*****>
JDBC Connection Properties:<Optional>
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 2

sqoop:000> show link
+----+-------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
+----+-------+--------------+------------------------+---------+


Step 3:
Create a link for import location i.e. MFS location

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: maprfs
Link configuration
HDFS URI:maprfs://<CLDB HostName>:7222
Hadoop conf directory: /opt/mapr/hadoop/hadoop-0.20.2/conf
New link was successfully created with validation status OK and persistent id 4

sqoop:000> show link
+----+--------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+--------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
| 4 | maprfs | 3 | hdfs-connector | true |
+----+--------+--------------+------------------------+---------+


Step 4:
Create a Job

sqoop:000> create job --from 2 --to 4
Creating job for links with from id 2 and to id 4
Please fill following values to create new job object
Name: tetsjob
From database configuration
Schema name: mysql
Table name: <TableName>
Table SQL statement:<Optional>
Table column names:<Optional>
Partition column name: <Provide a ColumnNamefor Partitioning>
Null value allowed for the partition column: true
Boundary query:<Optional>
Incremental read
Check column:<Optional>
Last value:<Optional>
To HDFS configuration
Override null value:<Optional>
Null value:<Optional>
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: </MFS LOCATION NAME>
Append mode:<Optional>
Throttling resources
Extractors:<Optional>
Loaders:<Optional>
New job was successfully created with validation status OK and persistent id 12

sqoop:000> show job

sqoop:000> start job -j <Job Id>
Ex: start job -j 12
Submission details
Job ID: 12
Server URL:
Created by: mapr
Creation date:
Lastly updated by: mapr
External ID: job_<ID>
http://<Host>:8088/proxy/application_1461206632562_0005/
Source Connector schema: Schema{TABLE SCHEMA WILL BE DISPLAYED HERE}

sqoop:000> status job -j <JobID>

PostgresSql configuration for SQOOP2




Please follow below steps to configure the PostgresSql for Sqoop2.
(Sqoop2 will store it's metastore in PostgresSql.)
This one is specific to MapR-cluster  environment.

Required Steps:
Step 1:
Install the postgresql using below command
$ yum install postgresql-server

Step 2:
Start the postgresql service using below command
$ service postgresql initdb

Step 3: Change the parameter in the below specified file
$ vim /var/lib/pgsql/data/postgresql.conf
listen_addresses = <10.10.71.19 >
#Note : add IP of postgresql IP where it has installed.

Step 4:
Add parameters to below specified file
$ vim /var/lib/pgsql/data/pg_hba.conf

# "local" is for Unix domain socket connections only
#local all all ident
local all all trust
# IPv4 local connections:
#host all all 127.0.0.1/32 trust
host all all 10.10.72.78/32 trust

# IPv6 local connections:
host all all ::1/128 ident

Step 5:
Comment existing below parameters in the below and add new parameters values into the specified file.
$ vi /opt/mapr/sqoop/sqoop-2.0.0/server/conf/sqoop.properties

org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.postgresql.PostgresqlRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
org.apache.sqoop.repository.jdbc.url=jdbc:postgresql://10.10.72.110:5432/sqoop
org.apache.sqoop.repository.jdbc.driver=org.postgresql.Driver
org.apache.sqoop.repository.jdbc.user=sqoop
org.apache.sqoop.repository.jdbc.password=sqoop
#org.apache.sqoop.repository.jdbc.properties.property=value


Step 6:
Download Jar and place it into below path.
Downloadablw link:
Place the downloaded jar into this location.
/opt/mapr/sqoop/sqoop-2.0.0/lib

Step 7: Ecxecute below command.
$ chkconfig postgresql on

Step 8:
Start the postgresql shell using below command
$ psql -U postgres

Step 9:
Create a table using below command.

$ CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop'
NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;

$ CREATE DATABASE "sqoop" WITH OWNER = sqoop TABLESPACE = pg_default;

Step 10:
login postgres
$/usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start



Oozie installation in MapR platform for unsecured cluster



Steps:
Log-in as a root user and follow the below steps.
Step 1:
             $ cd /opt/mapr
             $ yum install mapr-oozie

Step 2: Add below properties into the " core-site.xml " file.
           
           $vi /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml
 <property>
  <name>hadoop.proxyuser.mapr.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.mapr.groups</name>
  <value>*</value>
</property>

Step 3: Re-configure the cluster using below  command.
$/opt/mapr/server/configure.sh   -R

Step 4: export the OOZIE_URL in CLI.

$export OOZIE_URL=http://10.10.80.242:11000/oozie

Step 5:
Start the oozie service from CLI using below command.
$maprcli node services -name oozie -action restart -nodes `hostname`

Step 6:Check the list of running of services using below command.
$maprcli node list -columns svc

Note:
These steps are for non-secure cluster only.


HP Vertica Cluster to HDFS platform Using SQOOP




Follow below steps to Achieve the importing from HP vertica cluster to HDFS.

Step 1:
Please download and add recent version of below jars to Sqoop library
vertica-jdbc-7.1.2-0.jar;
vertica-jdk5-6.1.3-0.jar
hadoop-vertica.jar

Step 2:
Please use below query to run  using sqoop.

>  sqoop import \
 --driver com.vertica.jdbc.Driver  \
--connect jdbc:vertica://<HOSTNAME>:5433/<DATABASE-NAME> \
 --username <UNAME> \
 -P \
 --table <TABLE-NAME> \
 --target-dir <TARGET-DIRECTORY-NAME>  \
--as-textfile  \
-m <No-Mappers>