我是靠谱客的博主 腼腆鼠标,这篇文章主要介绍ubuntu 16.04 搭建完全分布式之:HIVE搭建,现在分享给大家,希望可以做个参考。

对于hadoop集群来说,任何一个服务器按我理解都是可以弄hive的,反正hive就是个关系数据库,应该都是可以的
反正我在namenode机器上面弄的

哎……昨天写了好多,然后我以为相同的提交页面也是可以用的,结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊,我悔恨啊!

我就大概记录一下HIVE的搭建过程,然后记录一下坑有哪些吧

hadoop 2.7.7

介绍

HIVE和HBASE不同,HIVE是一个基于hadoop的数据仓库工具,他可以将结构化的数据映射为一张数据库表,并且提供完整的SQL查询功能,可以将SQL语句转换为MAPREDUCE任务进行运行

优势

可以直接通过类SQL语句快速实现mapreduce统计,不用开发专门的mapreduce应用

安装

安装mysql

sudo apt install mysql-server
sudo mysql_secure_installation
sudo mysql -uroot -p 进去看看

安装mysql连接器

查询对应hadoop版本使用的mysql连接器,最好不要使用apt安装自带的,我装了一下,然后添加了链接,并没有卵用
我用的是5.1.47,注意不要去安装connector8了,会出错
然后把connector jar文件放入到/usr/app/HIVE/lib里面

安装HIVE

设置好HIVE环境变量到profile里面,不放也行
设置conf/hive-env.sh

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # Set Hive and Hadoop environment variables here. These variables can be used # to control the execution of Hive. It should be used by admins to configure # the Hive installation (so that users do not have to set environment variables # or set command line parameters to get correct behavior). # # The hive service being invoked (CLI etc.) is available via the environment # variable SERVICE # Hive Client memory usage can be an issue if a large number of clients # are running at the same time. The flags below have been useful in # reducing memory usage: # # if [ "$SERVICE" = "cli" ]; then # if [ -z "$DEBUG" ]; then # export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit" # else # export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" # fi # fi # The heap size of the jvm stared by hive shell script can be controlled via: # # export HADOOP_HEAPSIZE=1024 # # Larger heap size may be required when running queries over large number of files or partitions. # By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be # appropriate for hive server. # Set HADOOP_HOME to point to a specific hadoop install directory # HADOOP_HOME=${bin}/../../hadoop # Hive Configuration Directory can be controlled by: # export HIVE_CONF_DIR= # Folder containing extra libraries required for hive compilation/execution can be controlled by: # export HIVE_AUX_JARS_PATH= export JAVA_HOME=/usr/java/jdk1.8.0_221 export HADOOP_HOME=/usr/local/hadoop export HIVE_HOME=/usr/app/hive

设置conf/hive-site.xml

复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>sl159753</value> <description>password to use against metastore database</description> </property> <property> <name>datanucleus.autoCreateSchema</name> <value>true</value> <description>password to use against metastore database</description> </property> <property> <name>hive.server2.thrift.sasl.qop</name> <value>auth</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> <description>password to use against metastore database</description> </property> </configuration>

注意点

我遇到的问题是链接不上mysql,排除两个问题:

  1. java connector版本选不对,选好了之后还是不行
  2. mysql默认只对localhost开启,所以如果是本机连接的话,hive-site.xml里面地址修改成localhost,或者把mysql对外开放端口权限打开就ok了
  3. 没有第三了

测试

/bin/./hive

create table test(id int, name string);
show tables;

这两个命令执行OK了,才说明mysql->connector->hive的链接已经打通了

数据倾斜问题

这个问题讲了就大了,有空拿个实例来扩展一下,挖个坑先

最后

以上就是腼腆鼠标最近收集整理的关于ubuntu 16.04 搭建完全分布式之:HIVE搭建的全部内容,更多相关ubuntu内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(55)

评论列表共有 0 条评论

立即
投稿
返回
顶部