BigData-Hadoop: May 2016

Working with Sqoop2 import command.

Note:
Please configure PostgresSql as Sqoop2 metastore , then follow below steps.
Reference for Configuring ,please follow previous post " PostgresSql configuration for SQOOP2"

This steps are tested in MapR-cluster-5.0,CentOS-6.6

STEPS:
Step 1:
Check available links and Connectors.
sqoop:000> show link
+----+------+--------------+----------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------+--------------+----------------+---------+
+----+------+--------------+----------------+---------+

sqoop:000> show connector
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| 1 | kite-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kite.KiteConnector |
FROM/TO |
| 2 | kafka-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kafka.KafkaConnector
| TO |
| 3 | hdfs-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.hdfs.HdfsConnector |
FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6-mapr-1507 |
org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+

Step 2:
Create a link for RDBMS( from which DB we would like to import Data
NOTE: Provide the Connecotr ID for Name :generic-jdbc-connector . For Ex here Id is 4 for -c
arguement.)

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: <mysql>
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://<DB HostName>/<Database>
Username: <sqoop>
Password: <*****>
JDBC Connection Properties:<Optional>
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 2

sqoop:000> show link
+----+-------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
+----+-------+--------------+------------------------+---------+

Step 3:
Create a link for import location i.e. MFS location

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: maprfs
Link configuration
HDFS URI:maprfs://<CLDB HostName>:7222
Hadoop conf directory: /opt/mapr/hadoop/hadoop-0.20.2/conf
New link was successfully created with validation status OK and persistent id 4

sqoop:000> show link
+----+--------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+--------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
| 4 | maprfs | 3 | hdfs-connector | true |
+----+--------+--------------+------------------------+---------+

Step 4:
Create a Job

sqoop:000> create job --from 2 --to 4
Creating job for links with from id 2 and to id 4
Please fill following values to create new job object
Name: tetsjob
From database configuration
Schema name: mysql
Table name: <TableName>
Table SQL statement:<Optional>
Table column names:<Optional>
Partition column name: <Provide a ColumnNamefor Partitioning>
Null value allowed for the partition column: true
Boundary query:<Optional>
Incremental read
Check column:<Optional>
Last value:<Optional>
To HDFS configuration
Override null value:<Optional>
Null value:<Optional>
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: </MFS LOCATION NAME>
Append mode:<Optional>
Throttling resources
Extractors:<Optional>
Loaders:<Optional>
New job was successfully created with validation status OK and persistent id 12

sqoop:000> show job

sqoop:000> start job -j <Job Id>
Ex: start job -j 12
Submission details
Job ID: 12
Server URL:
Created by: mapr
Creation date:
Lastly updated by: mapr
External ID: job_<ID>
http://<Host>:8088/proxy/application_1461206632562_0005/
Source Connector schema: Schema{TABLE SCHEMA WILL BE DISPLAYED HERE}

sqoop:000> status job -j <JobID>

PostgresSql configuration for SQOOP2

Please follow below steps to configure the PostgresSql for Sqoop2.
(Sqoop2 will store it's metastore in PostgresSql.)
This one is specific to MapR-cluster environment.

Required Steps:

Step 1:

Install the postgresql using below command

$ yum install postgresql-server

Step 2:

Start the postgresql service using below command

$ service postgresql initdb

Step 3: Change the parameter in the below specified file

$ vim /var/lib/pgsql/data/postgresql.conf

listen_addresses = <10.10.71.19 >

#Note : add IP of postgresql IP where it has installed.

Step 4:

Add parameters to below specified file

$ vim /var/lib/pgsql/data/pg_hba.conf

# "local" is for Unix domain socket connections only

#local all all ident

local all all trust

# IPv4 local connections:

#host all all 127.0.0.1/32 trust

host all all 10.10.72.78/32 trust

# IPv6 local connections:

host all all ::1/128 ident

Step 5:

Comment existing below parameters in the below and add new parameters values into the specified file.

$ vi /opt/mapr/sqoop/sqoop-2.0.0/server/conf/sqoop.properties

org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.postgresql.PostgresqlRepositoryHandler

org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED

org.apache.sqoop.repository.jdbc.maximum.connections=10

org.apache.sqoop.repository.jdbc.url=jdbc:postgresql://10.10.72.110:5432/sqoop

org.apache.sqoop.repository.jdbc.driver=org.postgresql.Driver

org.apache.sqoop.repository.jdbc.user=sqoop

org.apache.sqoop.repository.jdbc.password=sqoop

#org.apache.sqoop.repository.jdbc.properties.property=value

Step 6:

Download Jar and place it into below path.

Downloadablw link:

$ wget https://jdbc.postgresql.org/download/postgresql-8.4-703.jdbc4.jar

Place the downloaded jar into this location.

/opt/mapr/sqoop/sqoop-2.0.0/lib

Step 7: Ecxecute below command.

$ chkconfig postgresql on

Step 8:

Start the postgresql shell using below command

$ psql -U postgres

Step 9:

Create a table using below command.

$ CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop'

NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;

$ CREATE DATABASE "sqoop" WITH OWNER = sqoop TABLESPACE = pg_default;

Step 10:

$/usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start

Oozie installation in MapR platform for unsecured cluster

Steps:
Log-in as a root user and follow the below steps.
Step 1:
$ cd /opt/mapr
$ yum install mapr-oozie

Step 2: Add below properties into the " core-site.xml " file.

$vi /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.mapr.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapr.groups</name>
<value>*</value>
</property>

Step 3: Re-configure the cluster using below command.

$/opt/mapr/server/configure.sh -R

Step 4: export the OOZIE_URL in CLI.

$export OOZIE_URL=http://10.10.80.242:11000/oozie

Step 5:

Start the oozie service from CLI using below command.

$maprcli node services -name oozie -action restart -nodes `hostname`

Step 6:Check the list of running of services using below command.

$maprcli node list -columns svc

Note:

These steps are for non-secure cluster only.

HP Vertica Cluster to HDFS platform Using SQOOP

Follow below steps to Achieve the importing from HP vertica cluster to HDFS.

Step 1:

Please download and add recent version of below jars to Sqoop library

vertica-jdbc-7.1.2-0.jar;

vertica-jdk5-6.1.3-0.jar

hadoop-vertica.jar

Step 2:

Please use below query to run using sqoop.

> sqoop import \

--driver com.vertica.jdbc.Driver \

--connect jdbc:vertica://<HOSTNAME>:5433/<DATABASE-NAME> \

--username <UNAME> \

-P \

--table <TABLE-NAME> \

--target-dir <TARGET-DIRECTORY-NAME> \

--as-textfile \

-m <No-Mappers>

BigData-Hadoop

Thursday, May 26, 2016

Working with Sqoop2 import command.

PostgresSql configuration for SQOOP2

Oozie installation in MapR platform for unsecured cluster

HP Vertica Cluster to HDFS platform Using SQOOP

Have look into some more blogs...?