Showing posts with label Sqoop2. Show all posts
Showing posts with label Sqoop2. Show all posts

Thursday, May 26, 2016

Working with Sqoop2 import command.



Note:
Please configure PostgresSql as Sqoop2 metastore , then follow below steps.
Reference for Configuring ,please follow previous post " PostgresSql configuration for SQOOP2"

 This steps are tested in MapR-cluster-5.0,CentOS-6.6

 STEPS:
Step 1:
Check available links and Connectors.
sqoop:000> show link
+----+------+--------------+----------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------+--------------+----------------+---------+
+----+------+--------------+----------------+---------+

sqoop:000> show connector
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| 1 | kite-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kite.KiteConnector |
FROM/TO |
| 2 | kafka-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kafka.KafkaConnector
| TO |
| 3 | hdfs-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.hdfs.HdfsConnector |
FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6-mapr-1507 |
org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+


Step 2:
Create a link for RDBMS( from which DB we would like to import Data
NOTE: Provide the Connecotr ID for Name :generic-jdbc-connector . For Ex here Id is 4 for -c
arguement.)

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: <mysql>
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://<DB HostName>/<Database>
Username: <sqoop>
Password: <*****>
JDBC Connection Properties:<Optional>
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 2

sqoop:000> show link
+----+-------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
+----+-------+--------------+------------------------+---------+


Step 3:
Create a link for import location i.e. MFS location

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: maprfs
Link configuration
HDFS URI:maprfs://<CLDB HostName>:7222
Hadoop conf directory: /opt/mapr/hadoop/hadoop-0.20.2/conf
New link was successfully created with validation status OK and persistent id 4

sqoop:000> show link
+----+--------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+--------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
| 4 | maprfs | 3 | hdfs-connector | true |
+----+--------+--------------+------------------------+---------+


Step 4:
Create a Job

sqoop:000> create job --from 2 --to 4
Creating job for links with from id 2 and to id 4
Please fill following values to create new job object
Name: tetsjob
From database configuration
Schema name: mysql
Table name: <TableName>
Table SQL statement:<Optional>
Table column names:<Optional>
Partition column name: <Provide a ColumnNamefor Partitioning>
Null value allowed for the partition column: true
Boundary query:<Optional>
Incremental read
Check column:<Optional>
Last value:<Optional>
To HDFS configuration
Override null value:<Optional>
Null value:<Optional>
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: </MFS LOCATION NAME>
Append mode:<Optional>
Throttling resources
Extractors:<Optional>
Loaders:<Optional>
New job was successfully created with validation status OK and persistent id 12

sqoop:000> show job

sqoop:000> start job -j <Job Id>
Ex: start job -j 12
Submission details
Job ID: 12
Server URL:
Created by: mapr
Creation date:
Lastly updated by: mapr
External ID: job_<ID>
http://<Host>:8088/proxy/application_1461206632562_0005/
Source Connector schema: Schema{TABLE SCHEMA WILL BE DISPLAYED HERE}

sqoop:000> status job -j <JobID>

PostgresSql configuration for SQOOP2




Please follow below steps to configure the PostgresSql for Sqoop2.
(Sqoop2 will store it's metastore in PostgresSql.)
This one is specific to MapR-cluster  environment.

Required Steps:
Step 1:
Install the postgresql using below command
$ yum install postgresql-server

Step 2:
Start the postgresql service using below command
$ service postgresql initdb

Step 3: Change the parameter in the below specified file
$ vim /var/lib/pgsql/data/postgresql.conf
listen_addresses = <10.10.71.19 >
#Note : add IP of postgresql IP where it has installed.

Step 4:
Add parameters to below specified file
$ vim /var/lib/pgsql/data/pg_hba.conf

# "local" is for Unix domain socket connections only
#local all all ident
local all all trust
# IPv4 local connections:
#host all all 127.0.0.1/32 trust
host all all 10.10.72.78/32 trust

# IPv6 local connections:
host all all ::1/128 ident

Step 5:
Comment existing below parameters in the below and add new parameters values into the specified file.
$ vi /opt/mapr/sqoop/sqoop-2.0.0/server/conf/sqoop.properties

org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.postgresql.PostgresqlRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
org.apache.sqoop.repository.jdbc.url=jdbc:postgresql://10.10.72.110:5432/sqoop
org.apache.sqoop.repository.jdbc.driver=org.postgresql.Driver
org.apache.sqoop.repository.jdbc.user=sqoop
org.apache.sqoop.repository.jdbc.password=sqoop
#org.apache.sqoop.repository.jdbc.properties.property=value


Step 6:
Download Jar and place it into below path.
Downloadablw link:
Place the downloaded jar into this location.
/opt/mapr/sqoop/sqoop-2.0.0/lib

Step 7: Ecxecute below command.
$ chkconfig postgresql on

Step 8:
Start the postgresql shell using below command
$ psql -U postgres

Step 9:
Create a table using below command.

$ CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop'
NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;

$ CREATE DATABASE "sqoop" WITH OWNER = sqoop TABLESPACE = pg_default;

Step 10:
login postgres
$/usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start