概述
对于hadoop集群来说,任何一个服务器按我理解都是可以弄hive的,反正hive就是个关系数据库,应该都是可以的
反正我在namenode机器上面弄的
哎……昨天写了好多,然后我以为相同的提交页面也是可以用的,结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊,我悔恨啊!
我就大概记录一下HIVE的搭建过程,然后记录一下坑有哪些吧
hadoop 2.7.7
介绍
HIVE和HBASE不同,HIVE是一个基于hadoop的数据仓库工具,他可以将结构化的数据映射为一张数据库表,并且提供完整的SQL查询功能,可以将SQL语句转换为MAPREDUCE任务进行运行
优势
可以直接通过类SQL语句快速实现mapreduce统计,不用开发专门的mapreduce应用
安装
安装mysql
sudo apt install mysql-server
sudo mysql_secure_installation
sudo mysql -uroot -p 进去看看
安装mysql连接器
查询对应hadoop版本使用的mysql连接器,最好不要使用apt安装自带的,我装了一下,然后添加了链接,并没有卵用
我用的是5.1.47,注意不要去安装connector8了,会出错
然后把connector jar文件放入到/usr/app/HIVE/lib里面
安装HIVE
设置好HIVE环境变量到profile里面,不放也行
设置conf/hive-env.sh
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI etc.) is available via the environment
# variable SERVICE
# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# else
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# fi
# fi
# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server.
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
export JAVA_HOME=/usr/java/jdk1.8.0_221
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/app/hive
设置conf/hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>sl159753</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>true</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.server2.thrift.sasl.qop</name>
<value>auth</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>password to use against metastore database</description>
</property>
</configuration>
注意点
我遇到的问题是链接不上mysql,排除两个问题:
- java connector版本选不对,选好了之后还是不行
- mysql默认只对localhost开启,所以如果是本机连接的话,hive-site.xml里面地址修改成localhost,或者把mysql对外开放端口权限打开就ok了
- 没有第三了
测试
/bin/./hive
create table test(id int, name string);
show tables;
这两个命令执行OK了,才说明mysql->connector->hive的链接已经打通了
数据倾斜问题
这个问题讲了就大了,有空拿个实例来扩展一下,挖个坑先
最后
以上就是腼腆鼠标为你收集整理的ubuntu 16.04 搭建完全分布式之:HIVE搭建的全部内容,希望文章能够帮你解决ubuntu 16.04 搭建完全分布式之:HIVE搭建所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复