BigData-Hadoop

Friday, June 9, 2017

Multi-mfs Enablement

For DB workloads on high-end servers, MapR has made several performance enhancements. For MapR-DB deployments on clusters with SSDs, two fileserver instances are configured on nodes with at least two SPs.
On servers with SSDs, this feature is automatically enabled with a fresh install or upgrade.

List of SP's

[root@VM202 ~]# /opt/mapr/server/mrconfig sp list
ListSPs resp: status 0:2
No. of SPs (2), totalsize 47539 MB, totalfree 46644 MB

SP 0: name SP1, Online, size 24179 MB, free 23686 MB, path /dev/sdb
SP 1: name SP2, Online, size 23360 MB, free 22958 MB, path /dev/sdd

Currently Running mfs instances

[root@VM202 ~]# /opt/mapr/server/mrconfig info instances
1
5660

For now, I have 2 SP's ,for each SP, I am enabling 1 mfs instance. So for 1 SP , one dedicated mfs process(5660) and for another SP, another dedicated mfs process (5661) will be assigning.

Here are the way to enable multi-mfs.
[root@VM202 ~]# maprcli config save -values {multimfs.numinstances.pernode:2}
[root@VM202 ~]# maprcli config save -values {multimfs.numsps.perinstance:1}

Restart the warden after changing process.
[root@VM202 ~]# service mapr-warden restart

After Resatrt the warden , check the mfs instances.
[root@VM202 ~]# /opt/mapr/server/mrconfig info instances
2
5660 5661
here we are able to see 2 mfs processes are running with id's 5660,5661

And in logs directory , we can see two logs generated for each one.
[root@VM202 ~]# ll /opt/mapr/logs/
mfs.log-0
mfs.log.1-0

mfs.log-1
mfs.log.1-1

mfs.log.1-2
mfs.log-2

mfs.log-3
mfs.log.1-3

mfs.log-4
mfs.log.1-4

Thursday, June 8, 2017

MySQL Installation , Configuration and creating Database and User

In this blog, would like to add steps to MySQL

Installation
Configuration
Creating Database
Creating User
Logging into mysql using newly created user

Installation

1. Install below mysql packages on the node.

[root@VM200 ~]# rpm -qa| grep mysql
mysql-5.1.73-8.el6_8.x86_64
mysql-connector-java-5.1.17-6.el6.noarch
mysql-libs-5.1.73-8.el6_8.x86_64
mysql-devel-5.1.73-8.el6_8.x86_64
mysql-server-5.1.73-8.el6_8.x86_64

#yum install mysql mysql-connector-java mysql-devel mysql-server -y

Configuration
2. Check the mysql services is runing or not using below command.

[root@VM200 ~]# service mysqld status
mysqld is stopped

Start the mysql service.
[root@VM200 ~]# service mysqld start

Or inst
All packages are available in below location:
https://dev.mysql.com/downloads/mysql/5.6.html#downloads

[root@psnode142 mapr]# rpm -Uvh mysql-community-release-el7-5.noarch.rpm
[root@psnode142 mapr]# yum install mysql-server

[root@psnode142 mapr]# service mysqld status
Redirecting to /bin/systemctl status mysqld.service
● mysqld.service - MySQL Community Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-02-27 09:12:20 PST; 2s ago

[root@psnode142 mapr]# netstat -plant| grep 3306
tcp6       0      0 :::3306                 :::*                    LISTEN      19006/mysqld
[root@psnode142 mapr]#

3. Run the secure installation by providing the root password of underlying OS and generate new password for mysql logging in.
[root@VM200 ~]# /usr/bin/mysql_secure_installation

Enter current password for root (enter for none): (linux VM PWD: yy)

Change the root password? [Y/n] y
New password: (MySQL root PWD: x)
Re-enter new password: (x)

Remove anonymous users? [Y/n] y

Disallow root login remotely? [Y/n] n

Remove test database and access to it? [Y/n] y
Reload privilege tables now? [Y/n] y

Creating Database & User
4. Logging into MySQL

[root@tVM200 mapr]# mysql -u root -p
Enter password: <mapr>
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.6.39 MySQL Community Server (GPL)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)

mysql> create database sdb;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on sdb.* to 'suser'@'%' identified by 'mapr';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> exit
Bye
[root@VM200 mapr]# mysql -u suser -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.6.39 MySQL Community Server (GPL)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| sdb                |
+--------------------+
2 rows in set (0.00 sec)

mysql> use sdb;
Database changed
mysql> show tables;
Empty set (0.00 sec)

Wednesday, June 7, 2017

Install , Configure and Use Clush

Clush is an open source tool that allows you to execute commands in parallel across the nodes in the cluster. This blog describes how to install clush, configure and use clush to run commands on multiple nodes in parallel.

The clush utility need to be only installed on one node, usually the primary node (10.10.72.200) in the cluster where we will run commands in parallel and gather stats .

Step 1: Install clustershell package
[root@VM200 ~]# yum --enablerepo=epel install clustershell

To get all cluster nodes and services on mapr cluster.

[root@VM200 ~]# maprcli node list -columns svc
service                                                                       hostname ip
fileserver,historyserver,webserver,nodemanager,hoststats                      VM200     10.10.72.200
fileserver,hivemeta,webserver,nodemanager,hs2,hoststats                       VM201     10.10.72.201
nodemanager,spark-historyserver,cldb,fileserver,hoststats,hue                 VM202     10.10.72.202
tasktracker,nodemanager,cldb,fileserver,resourcemanager,hoststats,jobtracker VM203     10.10.72.203

Step 2: Create a "groups" file under "/etc/clustershell/" and add all cluster nodes.
[root@VM200 ~]# vi /etc/clustershell/groups
all:10.10.72.200,10.10.72.201,10.10.72.202,10.10.72.203
[root@VM200 ~]#

[root@VM204 ~]# clush -a date
10.10.72.200: Host key verification failed.
clush: 10.10.72.200: exited with exit code 255
10.10.72.201: Host key verification failed.
clush: 10.10.72.201: exited with exit code 255
10.10.72.202: Host key verification failed.
clush: 10.10.72.202: exited with exit code 255
10.10.72.203: Host key verification failed.
clush: 10.10.72.203: exited with exit code 255

To avoid this "Host key verification failed." message , follow below host key verification steps.
########## Gnerate the key dont give any password on prompt

# ssh-keygen

Example:

[root@VM204 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
4a:a0:74:96:51:e2:fa:4f:9d:4d:80:20:1a:96:11:5a root@VM204.mapr.com
The key's randomart image is:
+--[ RSA 2048]----+
....
...

-----------------------------------------------------------------
#### Copy the key to all the nodes for activating password less ssh

# ssh-copy-id 10.10.72.200
# ssh-copy-id 10.10.72.201
# ssh-copy-id 10.10.72.202
# ssh-copy-id 10.10.72.203
# ssh-copy-id 10.10.72.204

Example:

[root@VM200 ~]# ssh-copy-id 10.10.72.200
The authenticity of host '10.10.72.200 (10.10.72.200)' can't be established.
RSA key fingerprint is 20:d4:f6:6e:d5:4b:af:80:bc:21:2b:f6:21:51:19:65.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.10.72.200' (RSA) to the list of known hosts.
root@10.10.72.200's password:
Now try logging into the machine, with "ssh '10.10.72.200'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

[root@VM200 ~]# ssh-copy-id 10.10.72.201
[root@VM200 ~]# ssh-copy-id 10.10.72.202
[root@VM200 ~]# ssh-copy-id 10.10.72.203

[root@VM200 ~]# clush -a date
10.10.72.200: Wed Jun 7 17:16:58 IST 2017
10.10.72.201: Wed Jun 7 17:16:43 IST 2017
10.10.72.203: Wed Jun 7 17:18:07 IST 2017
10.10.72.202: Wed Jun 7 17:16:59 IST 2017
[root@VM200 ~]#

Tuesday, February 7, 2017

Hue-MariaDB intergation in CentOS-7.x

Issue:
Whenever you face an issue while launching HUE web UI as specified below in CentOS7/RHEL7 in MapR-.2 platform, please follow below steps.
raise errorclass, errorvalue
ProgrammingError: (1146, "Table 'hue.auth_user' doesn't exist")

To over come this issue, please use below steps:

Environment:
[root@VM206 mapr]# rpm -qa| grep mapr
mapr-core-internal-5.2
mapr-core-5.2
mapr-hue-base-3.9.0

Steps:
Step 1:
Run the following commands to install MariaDB and the Redhat 6 compatibility library:
#yum install mariadb
#ver=$(rpm -qa mariadb|cut -d- -f2)
#rpm -ivh --nodeps http://yum.mariadb.org/$ver/rhel7-amd64/rpms/MariaDB-$ver-centos7-x86_64-compat.rpm

Logging into MariaDB
[root@VM206 mapr]# mysql -u root -p
Enter password:<pwd> (= <puli> )
MariaDB [(none)]> create database hue;
MariaDB [(none)]> grant all on hue.* to 'hue'@10.10.72.206 identified by 'puli';

MariaDB [(none)]> flush privileges;
MariaDB [(none)]> use hue;
MariaDB [hue]> show tables;
Empty set (0.00 sec)

step 2:
Run the following command to create a symlink for the Cyrus SASL library:

#ln -s /lib64/libsasl2.so.3.0.0 /lib64/libsasl2.so.2

Step 3:
Run the following command to reconfigure Hue:
#bash -c "source /opt/mapr/hue/hue-3.9.0/build/env/bin/activate;
/opt/mapr/hue/hue-3.9.0/build/env/bin/hue syncdb --noinput;
/opt/mapr/hue/hue-3.9.0/build/env/bin/hue migrate"

Step 4:
Run the following command to restart Hue:
# maprcli node services -name hue -action restart -nodes `hostname`

Verification:

Log in to Hue host:
http://10.10.72.206:8888/

You will be able to see launching the HUE web UI without any issues.

Check in MariaDB:

[root@VM206 mapr]# mysql -u root -p
Enter password:<pwd>

MariaDB [(none)]> use hue;

MariaDB [hue]> show tables;
+--------------------------------+
| Tables_in_hue |
auth_group |
| auth_group_permissions |
| auth_permission |
| auth_user |
| auth_user_groups
.......
......
(You will see here lot of tables . Around 74 tables.

Tuesday, November 15, 2016

Loading Data into Hbase from HDFS(MapRFS-mfs)

Step 1: HDFS file data

[root@localhost mapr]# hadoop fs -cat //user/mapr/sample.txt
100,sachin,Mum,India,Cricket
200,Samantha,Chennai,india,Actress
300,Modi,Ahd,Ind,politician

Step 2: Creating table in HBase with rowkey

hbase(main):006:0> create 'blkimport','cf'
0 row(s) in 1.2990 seconds

=> Hbase::Table - blkimport
hbase(main):007:0> list
blkimport

hbase(main):010:0> scan 'blkimport'
ROW                                   COLUMN+CELL
0 row(s) in 0.0180 seconds

Step 3: importing data from hdfs to HBase using ImportTsv

[mapr@localhost hbase-1.1.1]$ /opt/mapr/hbase/hbase-1.1.1/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' -Dimporttsv.columns='HBASE_ROW_KEY,cf:pname,cf:loc,cf:country,cf:prefession'   blkimport /user/mapr/sample.txt

Here /user/mapr/sample.txt--> HDFS file
         blkimport --------------> HBase table
         cf------------------------> ColumnFamily
        HBASE_ROW_KEY-> RowKey
   pname,loc,country,prefession--> Column Names

Step 4: Scan table HBase

hbase(main):011:0> scan 'blkimport'
ROW                                   COLUMN+CELL
100                                  column=cf:country, timestamp=1477509673574, value=India
100                                  column=cf:loc, timestamp=1477509673574, value=Mum
100                                  column=cf:pname, timestamp=1477509673574, value=sachin
100                                  column=cf:prefession, timestamp=1477509673574, value=Cricket
200                                  column=cf:country, timestamp=1477509673574, value=india
200                                  column=cf:loc, timestamp=1477509673574, value=Chennai
200                                  column=cf:pname, timestamp=1477509673574, value=Samantha
200                                  column=cf:prefession, timestamp=1477509673574, value=Actress
300                                  column=cf:country, timestamp=1477509673574, value=Ind
300                                  column=cf:loc, timestamp=1477509673574, value=Ahd
300                                  column=cf:pname, timestamp=1477509673574, value=Modi
300                                  column=cf:prefession, timestamp=1477509673574, value=politician
3 row(s) in 0.0570 seconds

Spark-Hive integration

1.Connecting using HiveThriftServer(Port-10000)
2.Connecting using SparkThriftServer(Port-10001)

Enviroment:

# rpm -qa | grep mapr
mapr-hivemetastore-1.2.201609261225-1.noarch
mapr-spark-historyserver-1.6.1.201609271200-1.noarch
mapr-hiveserver2-1.2.201609261225-1.noarch
mapr-spark-1.6.1.201609271200-1.noarch
mapr-mapreduce2-2.7.0.37549.GA-1.x86_64
mapr-hive-1.2.201609261225-1.noarch

Configuration steps:

Add below properties to "spark-defaults.conf" file.
# vi /opt/mapr/spark/spark-1.6.1/conf/spark-defaults.conf

spark.yarn.dist.files=/opt/mapr/hive/hive-1.2/conf/hive-site.xml,/opt/mapr/hive/hive-1.2/lib/datanucleus-api-jdo-4.2.1.jar,/opt/mapr/hive/hive-1.2/lib/datanucleus-core-4.1.6.jar,/opt/mapr/hive/hive-1.2/lib/datanucleus-rdbms-4.1.7.jar
spark.sql.hive.metastore.version=1.2.0
spark.sql.hive.metastore.jars=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hive/hive-1.2/lib/accumulo-core-1.6.0.jar:/opt/mapr/hive/hive-1.2/lib/hive-contrib-1.2.0-mapr-1609.jar:/opt/mapr/hive/hive-1.2/lib/*

And add this below export property to spark-env.sh file
#vi /opt/mapr/spark/spark-1.6.1/conf/spark-env.sh
export HIVE_SERVER2_THRIFT_PORT=10001

Starting SparkThriftServer with port 10001 using below command:

[root@VM204 spark-1.6.1]# /opt/mapr/spark/spark-1.6.1/sbin/start-thriftserver.sh start --hive-conf hive.server2.thrift.port=10001

(OR)

[root@VM204 spark-1.6.1]# /opt/mapr/spark/spark-1.6.1/sbin/start-thriftserver.sh start hive.server2.thrift.port=10001

Check the port is running or not.?

[root@VM204 spark-1.6.1]# netstat -plant | grep 10001
tcp        0      0 :::10001                    :::*                        LISTEN      13588/java

Checking db's ,table's and data from Hive CLI:

[mapr@VM204 spark-1.6.1]$ hive

Logging initialized using configuration in file:/opt/mapr/hive/hive-1.2/conf/hive-log4j.properties
hive> show databases;
OK
default
dummy
Time taken: 1.57 seconds, Fetched: 2 row(s)

hive> use default;
OK
Time taken: 0.05 seconds

hive> show tables;
OK
b
b_ext
demotest1
test
Time taken: 0.049 seconds, Fetched: 4 row(s)

hive> select *from b;
OK
10      kuamr
20      nitin
30      shishir
Time taken: 1.138 seconds, Fetched: 3 row(s)
hive>

Check the same data from Spark beeline:

1.Connecting using HiveThriftServer(Port-10000)

[root@VM204 spark-1.6.1]# pwd
/opt/mapr/spark/spark-1.6.1

[mapr@VM204 spark-1.6.1]$ ./bin/beeline
Beeline version 1.6.1-mapr-1609 by Apache Hive

beeline> !connect jdbc:hive2://10.10.72.204:10000
Connecting to jdbc:hive2://10.10.72.204:10000
Enter username for jdbc:hive2://10.10.72.204:10000: mapr
Enter password for jdbc:hive2://10.10.72.204:10000: ****   <mapr>
Connected to: Apache Hive (version 1.2.0-mapr-1609)
Driver: Spark Project Core (version 1.6.1-mapr-1609)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://10.10.72.204:10000> show databases;

+----------------+--+
| database_name |
+----------------+--+
| default        |
| dummy          |
+----------------+--+

2.Connecting using SparkThriftServer(Port-10001)
[mapr@VM204 spark-1.6.1]$ ./bin/beeline
Beeline version 1.6.1-mapr-1609 by Apache Hive

beeline> !connect jdbc:hive2://10.10.72.204:10001
Connecting to jdbc:hive2://10.10.72.204:10001
Enter username for jdbc:hive2://10.10.72.204:10001: mapr
Enter password for jdbc:hive2://10.10.72.204:10001: ****    <mapr>
Connected to: Spark SQL (version 1.6.1)
Driver: Spark Project Core (version 1.6.1-mapr-1609)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://10.10.72.204:10001> show databases;
+----------+--+
| result |
+----------+--+
| default |
| dummy    |
+----------+--+
2 rows selected (4.289 seconds)
0: jdbc:hive2://10.10.72.204:10001>

0: jdbc:hive2://10.10.72.204:10001> use default;
+---------+--+
| result |
+---------+--+
+---------+--+
No rows selected (0.134 seconds)

0: jdbc:hive2://10.10.72.204:10001> show tables;
+------------+--------------+--+
| tableName | isTemporary |
+------------+--------------+--+
| b          | false        |
| b_ext      | false        |
| test       | false        |
+------------+--------------+--+
3 rows selected (0.077 seconds)

0: jdbc:hive2://10.10.72.204:10001> select *from b;
+-----+----------+--+
| a |    b     |
+-----+----------+--+
| 10 | kuamr    |
| 20 | nitin    |
| 30 | shishir |
+-----+----------+--+
3 rows selected (1.916 seconds)
0: jdbc:hive2://10.10.72.204:10001>

Loading data into MapR tables(MapRDB) using "importtsv"

Logging into HBase

[mapr@6VM201 root]$ hbase shell

Creating MapRDB table saying "/user/mapr/blk"
hbase(main):018:0> create '/user/mapr/blk','cf'
0 row(s) in 0.0390 seconds

=> Hbase::Table - /user/mapr/blk

File "/user/test.txt" need to be load into MapRDB table
[root@6VM201 ~]# hadoop fs -cat /user/test.txt
761377,kuamr,hyd,india
761344,nitin,mum,india

Command to load into MapRDB table using " importtsv"

[mapr@6VM201 hbase-1.1.1]$ /opt/mapr/hbase/hbase-1.1.1/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:name,cf:loc,cf:country -Dimporttsv.separator=',' -Dimporttsv.bulk.output=/user/mapr/nitintab3 /user/mapr/blk /user/test.txt

Here /user/mapr/blk --> MapRDB table
        /user/test.txt------> Source file to be loading into MapRDB table
      /user/mapr/nitintab3->final Status to be stored in this file

[mapr@6VM201 root]$ hbase shell

Check the data in MapRDB table
hbase(main):019:0> scan '/user/mapr/blk'
ROW                                               COLUMN+CELL
761344                                           column=cf:country, timestamp=1475197667463, value=india
761344                                           column=cf:loc, timestamp=1475197667463, value=mum
761344                                           column=cf:name, timestamp=1475197667463, value=nitin
761377                                           column=cf:country, timestamp=1475197667463, value=india
761377                                           column=cf:loc, timestamp=1475197667463, value=hyd
761377                                           column=cf:name, timestamp=1475197667463, value=kuamr
2 row(s) in 0.0510 seconds

Friday, June 9, 2017

Thursday, June 8, 2017

Wednesday, June 7, 2017

Tuesday, February 7, 2017

Tuesday, November 15, 2016

Have look into some more blogs...?