Tuesday, November 15, 2016

Spark-Hive integration



1.Connecting using HiveThriftServer(Port-10000)
2.Connecting using SparkThriftServer(Port-10001)

Enviroment:

# rpm -qa | grep mapr
mapr-hivemetastore-1.2.201609261225-1.noarch
mapr-spark-historyserver-1.6.1.201609271200-1.noarch
mapr-hiveserver2-1.2.201609261225-1.noarch
mapr-spark-1.6.1.201609271200-1.noarch
mapr-mapreduce2-2.7.0.37549.GA-1.x86_64
mapr-hive-1.2.201609261225-1.noarch

Configuration steps:

Add below properties to "spark-defaults.conf" file.
# vi /opt/mapr/spark/spark-1.6.1/conf/spark-defaults.conf
<!--SPARK-SQL and HIve Integration-->
spark.yarn.dist.files=/opt/mapr/hive/hive-1.2/conf/hive-site.xml,/opt/mapr/hive/hive-1.2/lib/datanucleus-api-jdo-4.2.1.jar,/opt/mapr/hive/hive-1.2/lib/datanucleus-core-4.1.6.jar,/opt/mapr/hive/hive-1.2/lib/datanucleus-rdbms-4.1.7.jar
spark.sql.hive.metastore.version=1.2.0
spark.sql.hive.metastore.jars=/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hive/hive-1.2/lib/accumulo-core-1.6.0.jar:/opt/mapr/hive/hive-1.2/lib/hive-contrib-1.2.0-mapr-1609.jar:/opt/mapr/hive/hive-1.2/lib/*




And add this below export property to spark-env.sh file
#vi /opt/mapr/spark/spark-1.6.1/conf/spark-env.sh
export HIVE_SERVER2_THRIFT_PORT=10001

Starting SparkThriftServer with port 10001 using below command:

[root@VM204 spark-1.6.1]# /opt/mapr/spark/spark-1.6.1/sbin/start-thriftserver.sh  start --hive-conf hive.server2.thrift.port=10001

(OR)

[root@VM204 spark-1.6.1]# /opt/mapr/spark/spark-1.6.1/sbin/start-thriftserver.sh  start  hive.server2.thrift.port=10001

Check the port is running or not.?

[root@VM204 spark-1.6.1]# netstat -plant | grep 10001
tcp        0      0 :::10001                    :::*                        LISTEN      13588/java

Checking db's ,table's and data from Hive CLI:

[mapr@VM204 spark-1.6.1]$ hive

Logging initialized using configuration in file:/opt/mapr/hive/hive-1.2/conf/hive-log4j.properties
hive> show databases;
OK
default
dummy
Time taken: 1.57 seconds, Fetched: 2 row(s)

hive> use default;
OK
Time taken: 0.05 seconds

hive> show tables;
OK
b
b_ext
demotest1
test
Time taken: 0.049 seconds, Fetched: 4 row(s)

hive> select *from b;
OK
10      kuamr
20      nitin
30      shishir
Time taken: 1.138 seconds, Fetched: 3 row(s)
hive>

Check the same data from Spark beeline:

1.Connecting using HiveThriftServer(Port-10000)

[root@VM204 spark-1.6.1]# pwd
/opt/mapr/spark/spark-1.6.1

[mapr@VM204 spark-1.6.1]$ ./bin/beeline
Beeline version 1.6.1-mapr-1609 by Apache Hive

beeline> !connect jdbc:hive2://10.10.72.204:10000
Connecting to jdbc:hive2://10.10.72.204:10000
Enter username for jdbc:hive2://10.10.72.204:10000: mapr
Enter password for jdbc:hive2://10.10.72.204:10000: ****   <mapr>
Connected to: Apache Hive (version 1.2.0-mapr-1609)
Driver: Spark Project Core (version 1.6.1-mapr-1609)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://10.10.72.204:10000> show databases;

+----------------+--+
| database_name  |
+----------------+--+
| default        |
| dummy          |
+----------------+--+

2.Connecting using SparkThriftServer(Port-10001)
[mapr@VM204 spark-1.6.1]$ ./bin/beeline
Beeline version 1.6.1-mapr-1609 by Apache Hive

beeline> !connect jdbc:hive2://10.10.72.204:10001
Connecting to jdbc:hive2://10.10.72.204:10001
Enter username for jdbc:hive2://10.10.72.204:10001: mapr
Enter password for jdbc:hive2://10.10.72.204:10001: ****    <mapr>
Connected to: Spark SQL (version 1.6.1)
Driver: Spark Project Core (version 1.6.1-mapr-1609)
Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://10.10.72.204:10001> show databases;
+----------+--+
|  result  |
+----------+--+
| default  |
| dummy    |
+----------+--+
2 rows selected (4.289 seconds)
0: jdbc:hive2://10.10.72.204:10001>

0: jdbc:hive2://10.10.72.204:10001> use default;
+---------+--+
| result  |
+---------+--+
+---------+--+
No rows selected (0.134 seconds)

0: jdbc:hive2://10.10.72.204:10001> show tables;
+------------+--------------+--+
| tableName  | isTemporary  |
+------------+--------------+--+
| b          | false        |
| b_ext      | false        |
| test       | false        |
+------------+--------------+--+
3 rows selected (0.077 seconds)

0: jdbc:hive2://10.10.72.204:10001> select *from b;
+-----+----------+--+
|  a  |    b     |
+-----+----------+--+
| 10  | kuamr    |
| 20  | nitin    |
| 30  | shishir  |
+-----+----------+--+
3 rows selected (1.916 seconds)
0: jdbc:hive2://10.10.72.204:10001>



No comments:

Post a Comment