对于hadoop集群来说,任何一个服务器按我理解都是可以弄hive的,反正hive就是个关系数据库,应该都是可以的
反正我在namenode机器上面弄的
哎……昨天写了好多,然后我以为相同的提交页面也是可以用的,结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊,我悔恨啊!
我就大概记录一下HIVE的搭建过程,然后记录一下坑有哪些吧
hadoop 2.7.7
介绍
HIVE和HBASE不同,HIVE是一个基于hadoop的数据仓库工具,他可以将结构化的数据映射为一张数据库表,并且提供完整的SQL查询功能,可以将SQL语句转换为MAPREDUCE任务进行运行
优势
可以直接通过类SQL语句快速实现mapreduce统计,不用开发专门的mapreduce应用
安装
安装mysql
sudo apt install mysql-server
sudo mysql_secure_installation
sudo mysql -uroot -p 进去看看
安装mysql连接器
查询对应hadoop版本使用的mysql连接器,最好不要使用apt安装自带的,我装了一下,然后添加了链接,并没有卵用
我用的是5.1.47,注意不要去安装connector8了,会出错
然后把connector jar文件放入到/usr/app/HIVE/lib里面
安装HIVE
设置好HIVE环境变量到profile里面,不放也行
设置conf/hive-env.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # Set Hive and Hadoop environment variables here. These variables can be used # to control the execution of Hive. It should be used by admins to configure # the Hive installation (so that users do not have to set environment variables # or set command line parameters to get correct behavior). # # The hive service being invoked (CLI etc.) is available via the environment # variable SERVICE # Hive Client memory usage can be an issue if a large number of clients # are running at the same time. The flags below have been useful in # reducing memory usage: # # if [ "$SERVICE" = "cli" ]; then # if [ -z "$DEBUG" ]; then # export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit" # else # export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" # fi # fi # The heap size of the jvm stared by hive shell script can be controlled via: # # export HADOOP_HEAPSIZE=1024 # # Larger heap size may be required when running queries over large number of files or partitions. # By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be # appropriate for hive server. # Set HADOOP_HOME to point to a specific hadoop install directory # HADOOP_HOME=${bin}/../../hadoop # Hive Configuration Directory can be controlled by: # export HIVE_CONF_DIR= # Folder containing extra libraries required for hive compilation/execution can be controlled by: # export HIVE_AUX_JARS_PATH= export JAVA_HOME=/usr/java/jdk1.8.0_221 export HADOOP_HOME=/usr/local/hadoop export HIVE_HOME=/usr/app/hive
设置conf/hive-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>sl159753</value> <description>password to use against metastore database</description> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>true</value> <description>password to use against metastore database</description> </property> <property> <name>hive.server2.thrift.sasl.qop</name> <value>auth</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> <description>password to use against metastore database</description> </property> </configuration>
注意点
我遇到的问题是链接不上mysql,排除两个问题:
- java connector版本选不对,选好了之后还是不行
- mysql默认只对localhost开启,所以如果是本机连接的话,hive-site.xml里面地址修改成localhost,或者把mysql对外开放端口权限打开就ok了
- 没有第三了
测试
/bin/./hive
create table test(id int, name string);
show tables;
这两个命令执行OK了,才说明mysql->connector->hive的链接已经打通了
数据倾斜问题
这个问题讲了就大了,有空拿个实例来扩展一下,挖个坑先
最后
以上就是腼腆鼠标最近收集整理的关于ubuntu 16.04 搭建完全分布式之:HIVE搭建的全部内容,更多相关ubuntu内容请搜索靠谱客的其他文章。
发表评论 取消回复