Xf0RfTeY

· 7 years ago · Feb 07, 2019, 04:40 PM
1Prerequisites : 
2>> Kerberized Cluster
3
4>>Enable hive interactive server in hive
5
6>>Get following details from hive for spark
7
8spark.hadoop.hive.llap.daemon.service.hosts @llap0
9spark.sql.hive.hiveserver2.jdbc.url  jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
10spark.datasource.hive.warehouse.metastoreUri thrift://c420-node3.squadron-labs.com:9083
11
12
13
14Basic testing :
15
161) Create a table employee in hive and load some data
17        eg: 
18        Create table 
19        ----------------
20        
21CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String)
22COMMENT 'Employee details'
23ROW FORMAT DELIMITED
24FIELDS TERMINATED BY ','
25LINES TERMINATED BY '\n'
26STORED AS TEXTFILE;
27
28        Load data data.txt file into hdfs
29        ---------------
301201,Gopal,45000,Technical manager
311202,Manisha,45000,Proof reader
321203,Masthanvali,40000,Technical writer
331204,Kiran,40000,Hr Admin
341205,Kranthi,30000,Op Admin
35
36
37LOAD DATA INPATH '/tmp/data.txt' OVERWRITE INTO TABLE employee;
38
392) kinit to the spark user and run 
40
41spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/_HOST@HWX.COM" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
42
43Note: spark.security.credentials.hiveserver2.enabled should be set to false for YARN client deploy mode, and true for YARN cluster deploy mode (by default). This configuration is required for a Kerberized cluster
44
453) run following code in scala shell to view the table data
46import com.hortonworks.hwc.HiveWarehouseSession
47val hive = HiveWarehouseSession.session(spark).build()
48hive.execute("show tables").show
49hive.executeQuery("select * from employee").show
50
51
52
534) To apply common properties by default, add following setting into ambari spark2 custom conf
54
55
56spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/_HOST@HWX.COM
57spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083
58spark.datasource.hive.warehouse.load.staging.dir=/tmp/
59spark.hadoop.hive.llap.daemon.service.hosts=@llap0
60spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181
61
62
635) spark-shell --master yarn  --conf "spark.security.credentials.hiveserver2.enabled=false" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
64Note: Common properties are read from spark default properties
65
666) run following code in scala shell to view the hive table data
67
68import com.hortonworks.hwc.HiveWarehouseSession
69val hive = HiveWarehouseSession.session(spark).build()
70hive.execute("show tables").show
71hive.executeQuery("select * from employee").show
72
73
747) To integrate HWC in Livy2
75
76  a) add following property in Custom livy2-conf
77        livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector/
78  b) Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.
79  
80  c) Login to Zeppelin and in livy2 interpreter  settings add following 
81
82livy.spark.hadoop.hive.llap.daemon.service.hosts	@llap0
83livy.spark.security.credentials.hiveserver2.enabled	true
84livy.spark.sql.hive.hiveserver2.jdbc.url	jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
85livy.spark.sql.hive.hiveserver2.jdbc.url.principal	hive/_HOST@HWX.COM
86livy.spark.yarn.security.credentials.hiveserver2.enabled	true
87livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
88
89  d) Restart livy2 interpreter 
90  
91  e) in first paragraph add 
92  %livy2
93import com.hortonworks.hwc.HiveWarehouseSession
94val hive = HiveWarehouseSession.session(spark).build()
95
96  f) in second paragraph add
97   %livy2
98hive.executeQuery("select * from employee").show