BigData-Hadoop

Friday, June 10, 2016

Configuring network to a VM

Supported OS-Redhat/CentOS/Ubuntu

Step 1:

>vi /etc/resolv.conf

nameserver <provide the serverIP>

Step2:

>vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0

HWADDR=<default one>

TYPE=Ethernet

UUID=<default-one>

ONBOOT=yes

NETMASK=255.255.255.0

NM_CONTROLLED=no

BOOTPROTO=static

IPADDR=172.20.10.42

Step 3:

>vi /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=host-vm1

GATEWAY=172.20.10.1

Step 4:

>service NetworkManager stop

>service network restart

> ping google.com

PING google.com (216.58.194.206) 56(84) bytes of data.

64 bytes from sfo03s01-in-f14.1e100.net (216.58.194.206): icmp_seq=1 ttl=54 time=6.46 ms

(#You ll see response like this if everything went well)

Thursday, June 9, 2016

MapR-Hue and MySql Integration

Requisites are:

OS-CentOS-6.6/Redhat-6.7

MapR-5.1

Hue-3.9

MySql

Steps:

Here Hue and MySql is installed on different nodes.

MySql Host:Host1

MapR-Hue Host:Host2

On Host1 Host, do below steps: Confirm before going to install it using rpm -qa command.

Step 1:

>rpm -qa | grep mysql-devel

If not installed , do below execution.

>yum install mysql-devel

Step 2:

rpm -qa | grep mysql-connector-java

If not installed , do below execution.

>yum install mysql-connector-java

Step 3:

> rpm -qa | grep mysql-server

If not installed , do below execution

>yum install mysql-server

Step 4:Change the /etc/my.cnf file as follows:

>vi /etc/my.cnf

[mysqld]

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

bind-address=Host1

#This IP is MySql installed Host.

default-storage-engine=InnoDB

sql_mode=STRICT_ALL_TABLES

Step 5:

Start the mysql daemon.

$ sudo service mysqld start

Step 6:

Configure the Mysql with password, initially MySql pwd could be not set, please set here with pwd.

>/usr/bin/mysql_secure_installation

Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:x
Re-enter new password:x
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!

(#Here I have provided password as “ x ”)

Step 7:

>mysql> drop database hue;

Create a database for Hue with name “hue” in mysql by using below command.

mysql> create database hue;

Query OK, 1 row affected (0.00 sec)

Here the Host Host2 is the HUE installation HostName.

mysql> grant all on hue.* to 'hue'@Host2 identified by 'x';

Query OK, 0 rows affected (0.00 sec)

# here 'x' is the mysql password

mysql> flush privileges;

Query OK, 0 rows affected (0.00 sec)

Step 8:

In Hue installed Host Host2, Edit the hue.ini file with this information.

> vi /opt/mapr/hue/hue-3.9.0/desktop/conf/hue.ini

[[database]]

host=Host1

port=3306

engine=mysql

user=hue

password=x

name=hue

Add the below details under “# Settings for the RDBMS application “

> vi /opt/mapr/hue/hue-3.9.0/desktop/conf/hue.ini

[librdbms]

[[databases]]

#Add at MySql settings area.

host=Host1

port=3306

engine=mysql

user=hue

password=x

name=hue

Step 9:

Do these steps in Hue installation Host i.e. Host2

>/opt/mapr/hue/hue-3.9.0/build/env/bin/hue dumpdata > bkpHueData.json

Delete the data in the file in “model “ field if any thing with "useradmin.userprofile" , remove that json file and save it.

Check if any words are presenting with "useradmin.userprofile" string in backup file.

>cat bkpHueData.json | grep useradmin.userprofile

Step 10:

Restart the Hue serverice

> maprcli node services -name hue -action restart -nodes `hostname`

Syncdb operation:

/opt/mapr/hue/hue-3.9.0/build/env/bin/hue syncdb --noinput

Migrating the hue.

/opt/mapr/hue/hue-3.9.0/build/env/bin/hue migrate

From Host1 Mysql Installed Host execute this command.

Step 11:

From MySql Host do the below operation:

mysql> DELETE FROM hue.django_content_type;

Step 12:

From Host2, Hue installed host do this command.

> /opt/mapr/hue/hue-3.9.0/build/env/bin/hue loaddata bkpHueData.json

Start the Hue UI using the Url.

“ :8888 ” at url browser.

Try to add a user to “ AddUser “ are at mapr user and add it and check the updated table info at mysql using below command/.

mysql> select *from auth_user;

(It ll display the recently added user info in this table.)

Thursday, June 2, 2016

Configuring MapR-5.0.0 cluster on RHEL-6.7 OS.

Here I am adding very basic steps to follow configuring MapR-5.0.0 cluster on Redhat-6.7 operating system.

Please follow below steps for for configuring.

Note:

Here We have 2 disks in each node.

Cluster Information:

Cluster Version : MapR-5.0.0

OS version : RHEL-6.7

Number of nodes : 3 nodes.

Steps:

Step 1:

Add the hostnames and ipaddress of all the cluster nodes in /etc/hosts file on each node.

$sudo vi /etc/hosts

172.20.10.10 host10

172.20.10.11 host11

172.20.10.12 host12

Step 2:

Adding virtual disks info. in file called disks.txt to each node.

$sudo vi /tmp/disks.txt

xvdb

xvdc

Step 3:

Adding a group and a User to the each node

$sudo groupadd -g 5000 mapr

$sudo useradd -g 5000 -u 5000 mapr

Step 4:

Changing the SELINUX parameter=pemissive in each node

$sudo vi /etc/selinux/config

SELINUX=pemissive

Step 5:

Disable the iptables(disable firewall permanently) using below command in each node.

$sudo iptables -F

or use below two commands to disable permanently in each node.

$chkconfig iptables off

$service iptables stop

Step 6:

Add the MapR version and ecosystem repos to maprtech.repo file in each node

$sudo vi /etc/yum.repos.d/maprtech.repo

[maprtech]

name=MapR Technologies

baseurl=http://package.mapr.com/releases/v5.0.0/redhat/

enabled=1

gpgcheck=0

protect=1

[maprecosystem]

name=MapR Technologies

baseurl=http://package.mapr.com/releases/ecosystem-5.x/redhat

enabled=1

gpgcheck=0

protect=1

Step 7:

Use below rpm packages to download on each node

$sudo wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

Step 8:

Install the above downloaded rpm.

$sudo rpm -Uvh epel-release-6-8.noarch.rpm

Step 9:

import the key

$sudo rpm --import http://package.mapr.com/releases/pub/maprgpg.key

Step 10:

Install java-1.7.0.openjdk-devel , if not installed.

$sudo yum install java-1.7.0-openjdk-devel

Step 11:

Install the mapr-core packages(eg: mapr-fileserver,mapr-cldb,mapr-zookeeper etc... required services on each cluster)

$sudo yum install mapr-fileserver

$sudo yum install mapr-cldb mapr-zookeeper

Step 12:

List out the CLDB's ,Zookeeper's insta nodes and then run the configure.sh which is present in "/opt/mapr/server/" folder on each node.

List of CLDB installed nodes : 172.20.10.10,172.20.10.11

List of Zookeeper installed Nodes: 172.20.10.10,172.20.10.11,172.20.10.12

$sudo server/configure.sh -C 172.20.10.10,172.20.10.11 -Z 172.20.10.10,172.20.10.11,172.20.10.12 -N 3Node-MapR-5.0-Cluster

Step 13:

Add the disks.txt file each node

$sudo server/disksetup -F /tmp/disks.txt

Step 14:

use below command to check for all started commands.

$ sudo maprcli node services -columns svc
Note:
Please refer below link for more detailed info:
http://doc.mapr.com/display/MapR/Advanced+Installation+Topics

Thursday, May 26, 2016

Working with Sqoop2 import command.

Note:
Please configure PostgresSql as Sqoop2 metastore , then follow below steps.
Reference for Configuring ,please follow previous post " PostgresSql configuration for SQOOP2"

This steps are tested in MapR-cluster-5.0,CentOS-6.6

STEPS:
Step 1:
Check available links and Connectors.
sqoop:000> show link
+----+------+--------------+----------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------+--------------+----------------+---------+
+----+------+--------------+----------------+---------+

sqoop:000> show connector
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+
| 1 | kite-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kite.KiteConnector |
FROM/TO |
| 2 | kafka-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.kafka.KafkaConnector
| TO |
| 3 | hdfs-connector | 1.99.6-mapr-1507 | org.apache.sqoop.connector.hdfs.HdfsConnector |
FROM/TO |
| 4 | generic-jdbc-connector | 1.99.6-mapr-1507 |
org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+------------------+------------------------------------------------------
+----------------------+

Step 2:
Create a link for RDBMS( from which DB we would like to import Data
NOTE: Provide the Connecotr ID for Name :generic-jdbc-connector . For Ex here Id is 4 for -c
arguement.)

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: <mysql>
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://<DB HostName>/<Database>
Username: <sqoop>
Password: <*****>
JDBC Connection Properties:<Optional>
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 2

sqoop:000> show link
+----+-------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+-------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
+----+-------+--------------+------------------------+---------+

Step 3:
Create a link for import location i.e. MFS location

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: maprfs
Link configuration
HDFS URI:maprfs://<CLDB HostName>:7222
Hadoop conf directory: /opt/mapr/hadoop/hadoop-0.20.2/conf
New link was successfully created with validation status OK and persistent id 4

sqoop:000> show link
+----+--------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+--------+--------------+------------------------+---------+
| 2 | mysql | 4 | generic-jdbc-connector | true |
| 4 | maprfs | 3 | hdfs-connector | true |
+----+--------+--------------+------------------------+---------+

Step 4:
Create a Job

sqoop:000> create job --from 2 --to 4
Creating job for links with from id 2 and to id 4
Please fill following values to create new job object
Name: tetsjob
From database configuration
Schema name: mysql
Table name: <TableName>
Table SQL statement:<Optional>
Table column names:<Optional>
Partition column name: <Provide a ColumnNamefor Partitioning>
Null value allowed for the partition column: true
Boundary query:<Optional>
Incremental read
Check column:<Optional>
Last value:<Optional>
To HDFS configuration
Override null value:<Optional>
Null value:<Optional>
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom compression format:
Output directory: </MFS LOCATION NAME>
Append mode:<Optional>
Throttling resources
Extractors:<Optional>
Loaders:<Optional>
New job was successfully created with validation status OK and persistent id 12

sqoop:000> show job

sqoop:000> start job -j <Job Id>
Ex: start job -j 12
Submission details
Job ID: 12
Server URL:
Created by: mapr
Creation date:
Lastly updated by: mapr
External ID: job_<ID>
http://<Host>:8088/proxy/application_1461206632562_0005/
Source Connector schema: Schema{TABLE SCHEMA WILL BE DISPLAYED HERE}

sqoop:000> status job -j <JobID>

PostgresSql configuration for SQOOP2

Please follow below steps to configure the PostgresSql for Sqoop2.
(Sqoop2 will store it's metastore in PostgresSql.)
This one is specific to MapR-cluster environment.

Required Steps:

Step 1:

Install the postgresql using below command

$ yum install postgresql-server

Step 2:

Start the postgresql service using below command

$ service postgresql initdb

Step 3: Change the parameter in the below specified file

$ vim /var/lib/pgsql/data/postgresql.conf

listen_addresses = <10.10.71.19 >

#Note : add IP of postgresql IP where it has installed.

Step 4:

Add parameters to below specified file

$ vim /var/lib/pgsql/data/pg_hba.conf

# "local" is for Unix domain socket connections only

#local all all ident

local all all trust

# IPv4 local connections:

#host all all 127.0.0.1/32 trust

host all all 10.10.72.78/32 trust

# IPv6 local connections:

host all all ::1/128 ident

Step 5:

Comment existing below parameters in the below and add new parameters values into the specified file.

$ vi /opt/mapr/sqoop/sqoop-2.0.0/server/conf/sqoop.properties

org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.postgresql.PostgresqlRepositoryHandler

org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED

org.apache.sqoop.repository.jdbc.maximum.connections=10

org.apache.sqoop.repository.jdbc.url=jdbc:postgresql://10.10.72.110:5432/sqoop

org.apache.sqoop.repository.jdbc.driver=org.postgresql.Driver

org.apache.sqoop.repository.jdbc.user=sqoop

org.apache.sqoop.repository.jdbc.password=sqoop

#org.apache.sqoop.repository.jdbc.properties.property=value

Step 6:

Download Jar and place it into below path.

Downloadablw link:

$ wget https://jdbc.postgresql.org/download/postgresql-8.4-703.jdbc4.jar

Place the downloaded jar into this location.

/opt/mapr/sqoop/sqoop-2.0.0/lib

Step 7: Ecxecute below command.

$ chkconfig postgresql on

Step 8:

Start the postgresql shell using below command

$ psql -U postgres

Step 9:

Create a table using below command.

$ CREATE ROLE sqoop LOGIN ENCRYPTED PASSWORD 'sqoop'

NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;

$ CREATE DATABASE "sqoop" WITH OWNER = sqoop TABLESPACE = pg_default;

Step 10:

$/usr/bin/pg_ctl -D /var/lib/pgsql/data -l logfile start

Oozie installation in MapR platform for unsecured cluster

Steps:
Log-in as a root user and follow the below steps.
Step 1:
$ cd /opt/mapr
$ yum install mapr-oozie

Step 2: Add below properties into the " core-site.xml " file.

$vi /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.mapr.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapr.groups</name>
<value>*</value>
</property>

Step 3: Re-configure the cluster using below command.

$/opt/mapr/server/configure.sh -R

Step 4: export the OOZIE_URL in CLI.

$export OOZIE_URL=http://10.10.80.242:11000/oozie

Step 5:

Start the oozie service from CLI using below command.

$maprcli node services -name oozie -action restart -nodes `hostname`

Step 6:Check the list of running of services using below command.

$maprcli node list -columns svc

Note:

These steps are for non-secure cluster only.

HP Vertica Cluster to HDFS platform Using SQOOP

Follow below steps to Achieve the importing from HP vertica cluster to HDFS.

Step 1:

Please download and add recent version of below jars to Sqoop library

vertica-jdbc-7.1.2-0.jar;

vertica-jdk5-6.1.3-0.jar

hadoop-vertica.jar

Step 2:

Please use below query to run using sqoop.

> sqoop import \

--driver com.vertica.jdbc.Driver \

--connect jdbc:vertica://<HOSTNAME>:5433/<DATABASE-NAME> \

--username <UNAME> \

-P \

--table <TABLE-NAME> \

--target-dir <TARGET-DIRECTORY-NAME> \

--as-textfile \

-m <No-Mappers>

Friday, June 10, 2016

Thursday, June 9, 2016

Thursday, June 2, 2016

Thursday, May 26, 2016

Have look into some more blogs...?