概述
一、集群搭建步骤
虚拟机上安装centos7.5参照https://blog.csdn.net/a111111__/article/details/117230257(磁盘挂载配置看这个)和https://blog.csdn.net/weixin_45309636/article/details/108504978(其他配置看这个)
0、前置准备
需要centos7.5系统,在安装系统的时候选择开发者版本
执行命令
sudo yum install -y epel-release
sudo yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static
- 我们在Centos下使用yum安装时往往找不到rpm的情况,官方的rpm repository提供的rpm包也不够丰富,很多时候需要自己编译很痛苦,而EPEL恰恰可以解决这两方面的问题。EPEL的全称叫 Extra Packages for Enterprise Linux 。EPEL是由 Fedora 社区打造,为 RHEL 及衍生发行版如 CentOS、Scientific Linux 等提供高质量软件包的项目。装上了 EPEL之后,就相当于添加了一个第三方源。
- 配置如下图所示
1、配置IP地址
sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33
DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static #修改为静态
NAME="ens33"
#添加下面的IP地址和网关信息以及DNS域名服务器信息
#注意:这里的网关信息是虚拟机配置的,需要与虚拟机上的虚拟网关保持一致,在生产环境下需要提前获取集群的网关信息
IPADDR=192.168.10.100
PREFIX=24
GATEWAY=192.168.10.2
DNS1=192.168.10.2
2、配置hostname
vim /etc/hostnames
hadoop100
3、配置域名映射
sudo vim /etc/hosts
192.168.1.100 hadoop100
192.168.1.101 hadoop101
192.168.1.102 hadoop102
192.168.1.103 hadoop103
192.168.1.104 hadoop104
192.168.1.105 hadoop105
192.168.1.106 hadoop106
192.168.1.107 hadoop107
192.168.1.108 hadoop108
-
如果是在自己的虚拟集群上,自己可以在windows环境下也配置一下自己的hosts
-
1、进入C:WindowsSystem32driversetc路径
-
2、打开hosts文件并添加如下内容
192.168.1.100 hadoop100 192.168.1.101 hadoop101 192.168.1.102 hadoop102 192.168.1.103 hadoop103 192.168.1.104 hadoop104 192.168.1.105 hadoop105 192.168.1.106 hadoop106 192.168.1.107 hadoop107 192.168.1.108 hadoop108
-
注意:直接修改是不允许的,可以将hosts文件保存下来以后将原来的覆盖掉
-
4、关闭防火墙
sudo systemctl stop firewalld
sudo systemctl disable firewalld
5、创建用户
sudo useradd wt
sudo passwd wt
注意:这一步其实在安装系统的时候就可以完成,如果在安装系统的时候没有完成,需要创建自己的用户,在以后操作集群的时候,其实都是在用自己创建的用户在操作,只有在修改系统文件的时候才会切换到root用户下
6、配置用户权限
-
用root用户权限修改/etc/sudoers
## Allow root to run any commands anywhere root ALL=(ALL) ALL wt ALL=(ALL) ALL
7、在/opt目录下创建文件夹
-
1、在/opt目录下创建module、software文件夹
-
sudo mkdir module sudo mkdir software
-
-
2、修改module、software文件夹的所有者cd
-
sudo mkdir /opt/module /opt/software sudo chown wt:wt /opt/module /opt/software
-
8、卸载自带的JDK
rpm -qa | grep -i java | xargs -n1 sudo rpm -e –nodeps
9、上传自己的JDK,和hadoop压缩包
hadoop-3.1.3.tar.gz
jdk-8u212-linux-x64.tar.gz
10、解压
tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/
tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
11、配置JDK与hadoop环境变量
-
1、新建/etc/profile.d/my_env.sh文件
-
sudo vim /etc/profile.d/my_env.sh
-
-
2、添加如下内容
-
#JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin ##HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
-
-
3、重新加载一下资源环境
-
source /etc/profile
-
-
4、测试环境是否安装成功
-
java -version java version "1.8.0_212" hadoop version Hadoop 3.1.3
-
-
5、重启服务器
-
6、用hadoop自带的jar测试hadoop是否能正常运行
-
1.在创建在hadoop-3.1.3文件下面创建一个wcinput文件夹
mkdir wcinput
-
2.在wcinput目录下创建word.txt文件
hadoop yarn hadoop mapreduce spark spark
-
3.回到Hadoop目录/opt/module/hadoop-3.1.3执行程序
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput
-
查看执行结果
cat wcoutput/part-r-00000 spark 2 hadoop 2 mapreduce 1 yarn 1
-
12、克隆配置好的服务器
-
1、在虚拟机上拍快照,然后一个一个克隆
-
2、分别修改IP地址
DEVICE=ens33 TYPE=Ethernet ONBOOT=yes BOOTPROTO=static #修改为静态 NAME="ens33" #添加下面的IP地址和网关信息以及DNS域名服务器信息 #注意:这里的网关信息是虚拟机配置的,需要与虚拟机上的虚拟网关保持一致,在生产环境下需要提前获取集群的网关信息 IPADDR=192.168.10.101 PREFIX=24 GATEWAY=192.168.10.2 DNS1=192.168.10.2
-
3、分别修改主机名称
vim /etc/hostnames hadoop101
这个时候基本上就配置完了,后面就要开始操作hadoop的配置文件了,但是为了和方便的实现配置文件的同步,我们需要做如下步骤:
13、编写同步文件脚本
-
1.在/home/wt目录下创建xsync文件
cd /home/wt vim xsync
-
编写文件
#!/bin/bash #1. 判断参数个数 if [ $# -lt 1 ] then echo Not Enough Arguement! exit; fi #2. 遍历集群所有机器 for host in hadoop102 hadoop103 hadoop104 do echo ==================== $host ==================== #3. 遍历所有目录,挨个发送 for file in $@ do #4 判断文件是否存在 if [ -e $file ] then #5. 获取父目录 pdir=$(cd -P $(dirname $file); pwd) #6. 获取当前文件的名称 fname=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $pdir/$fname $host:$pdir else echo $file does not exists! fi done done
-
修改脚本 xsync 具有执行权限
chmod +x xsync
-
将脚本移动到/bin中,以便全局调用
sudo mv xsync /bin/
-
测试一下脚本,这个时候如果在一个文件夹下修改了某一个文件,那么就可以整体同步到集群的其他服务器下
xsync /bin/xsync
14、配置集群配置文件
-
1、集群部署规划
-
注意:NameNode和SecondaryNameNode不要安装在同一台服务器
-
注意:ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。
-
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-d679a832-1660384188935)(C:UsersHPAppDataRoamingTyporatypora-user-imagesimage-20220508163439178.png)]
-
2、配置核心文件
-
1、配置core-site.xml
cd $HADOOP_HOME/etc/hadoop vim core-site.xml #添加 <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop102:8020</value> </property> <property> <name>hadoop.data.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <property> <name>hadoop.proxyuser.atguigu.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.atguigu.groups</name> <value>*</value> </property> </configuration>
-
2、配置hdfs-site.xml
vim hdfs-site.xml #添加 <configuration> <!--nn web端访问地址--> <property> <name>dfs.namenode.http-address</name> <value>hadoop102:9870</value> </property> <!--2nn web端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop104:9868</value> </property> </configuration>
-
3、配置yarn-site.xml
vim yarn-site.xml #添加 <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop103</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> </configuration>
-
4、配置mapred-site.xml
vim mapred-site.xml #添加 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
-
3、配置works
hadoop102 hadoop103 hadoop104
注意:这个文件夹下面不能有任意一个多余的空格和空行
-
4、分发配置好的文件
xsync /opt/module/hadoop-3.1.3/etc/hadoop/
-
可以到其他服务器上查看一下相应的文件是否已经修改
15、启动集群
-
1、如果集群是第一次启动,需要格式化NameNode
hdfs namenode -format
-
2、根据集群规划,我们要在hadoop102上启动NameNode,在hadoop102、hadoop103以及hadoop104上执行如下命令(三台都要执行)
#我们要在hadoop102上启动NameNode hdfs --daemon start namenode #在hadoop102、hadoop103以及hadoop104上执行如下命令(三台都要执行) hdfs --daemon start datanode
-
很麻烦,我们可以通过群起命令启动集群,但是我们需要配置免密登录
16、配置免密登录
-
1、进入到/home/wt/.ssh/下执行命令,生成公钥和私钥
ssh-keygen -t rsa
-
2.将生成的公钥分别分发到集群服务器上,执行(不同用户不公用,wt用户能用,root不能用,想用切换用户单独配置)
ssh-copy-id hadoop102 ssh-copy-id hadoop103 ssh-copy-id hadoop104
-
3、在其他服务器上配置相同操作
17、群集集群
-
1、如果集群是第一次启动,需要在hadoop102节点格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)
hdfs namenode -format
-
2、启动HDFS**(hadoop102上启动)**
sbin/start-dfs.sh
-
3、**在配置了ResourceManager的节点(hadoop103)**启动YARN
sbin/start-yarn.sh
-
4、查看启动情况
jps
18、配置历史服务器
-
1、配置mapred-site.xml
<!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop102:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop102:19888</value> </property>
-
2、分发脚本
xsync $HADOOP_HOME/etc/hadoop/mapred-site.xml
-
3、在hadoop102启动历史服务器
mapred –daemon start historyserver
19、开启日志的聚集
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。
-
步骤如下:
-
1、1. 配置yarn-site.xml
<property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log.server.url</name> <value>http://hadoop102:19888/jobhistory/logs</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
-
2、 分发配置
xsync $HADOOP_HOME/etc/hadoop/yarn-site.xml
-
3、关闭集群
-
4、重启集群
在103上执行:start-yarn.sh 在102上执行:mapred --daemon start historyserver
-
19、群起脚本
我们发现,我们在启动hadoop集群的时候,需要在hadoop102上启动hdfs,启动historyserver,造hadoop103上启动yarn,很麻烦,也很容易启动错误,我们有没有什么方法来简化启动过程呢?这个时候我们就用到了群起脚本;
- vim my_hadoop.sh
#!/bin/bash
if [ $# -lt 1 ]
then
echo No Argument Input!
exit;
fi
case $1 in
"start")
echo "=================启动 Hadoop集群========================"
echo "------------------启动 hdfs-----------------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
echo "------------------启动 yarn-----------------------------"
ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
echo "------------------启动 historyserver--------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo "==================关闭 Hadoop集群==================="
echo "------------------关闭 historyserver-----------------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
echo "------------------关闭 yarn--------------------------------------"
ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
echo "------------------关闭 hdfs-----------------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
echo "input Args Error..."
;;
esac
-
2、将脚本放在/bin/目录下方便全局使用
-
3、给文件添加执行权限
chmod 777 myhadoop.sh
-
4、群起集群
myhadoop.sh start
这个时候我们想看集群的各个组件在集群中的运行情况,我们也可以添加集群的查看脚本
20、集群状态查看脚本
-
1、vim jpsall
#!/bin/bash for host in hadoop102 hadoop103 hadoop104 do echo =============== $host =============== ssh $host jps done
-
2、给脚本添加执行权限
chmod +777 jpsall
-
3、将脚本放在/bin/目录下以便全局使用
-
4、使用jpsall
jpsall
21、集群时间同步问题
注意:时间服务器配置(必须root用户)
-
1、在所有节点关闭ntp服务和自启动
systemctl stop ntpd systemctl disable ntpd
-
2、修改ntp配置文件(hadoop102)
vim /etc/ntp.conf ########修改1(授权192.168.1.0-192.168.1.255网段上的所有机器可以从这台机器上查询和同步时间) #restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap #######修改2(集群在局域网中,不使用其他互联网上的时间) server 0.centos.pool.ntp.org iburst server 1.centos.pool.ntp.org iburst server 2.centos.pool.ntp.org iburst server 3.centos.pool.ntp.org iburst #server 0.centos.pool.ntp.org iburst #server 1.centos.pool.ntp.org iburst #server 2.centos.pool.ntp.org iburst #server 3.centos.pool.ntp.org iburst ####添加3(当该节点丢失网络连接,依然可以采用本地时间作为时间服务器为集群中的其他节点提供时间同步) server 127.127.1.0 fudge 127.127.1.0 stratum 10
-
3、修改/etc/sysconfig/ntpd 文件(hadoop102)
vim /etc/sysconfig/ntpd #增加内容如下(让硬件时间与系统时间一起同步) SYNC_HWCLOCK=yes
-
4、重新启动ntpd服务
systemctl start ntpd
-
5、设置ntpd服务开机启动
systemctl enable ntpd
-
6、其他机器配置(必须root用户)(hadoop103、hadoop104)
crontab -e */10 * * * * /usr/sbin/ntpdate hadoop102
----------------------------------这个时候,集群就配置完全了,完结,撒花!----------------------------------------------------
二、hadoop总结
1、常见的操作命令
[wt@hadoop102 hadoop-3.1.3]$ bin/hadoop fs
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] <path> ...]
[-cp [-f] [-p] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
2、hadoop文件说明
[wt@hadoop102 hadoop-3.1.3]$ ll
总用量 180
drwxr-xr-x. 2 wt wt 183 9月 12 2019 bin
drwxrwxr-x. 4 wt wt 37 5月 6 16:08 data
drwxr-xr-x. 3 wt wt 20 9月 12 2019 etc
drwxr-xr-x. 2 wt wt 106 9月 12 2019 include
drwxr-xr-x. 3 wt wt 20 9月 12 2019 lib
drwxr-xr-x. 4 wt wt 288 9月 12 2019 libexec
-rw-rw-r--. 1 wt wt 147145 9月 4 2019 LICENSE.txt
drwxrwxr-x. 3 wt wt 4096 5月 7 12:14 logs
-rw-rw-r--. 1 wt wt 21867 9月 4 2019 NOTICE.txt
-rw-rw-r--. 1 wt wt 1366 9月 4 2019 README.txt
drwxr-xr-x. 3 wt wt 4096 9月 12 2019 sbin
drwxr-xr-x. 4 wt wt 31 9月 12 2019 share
drwxrwxr-x. 2 wt wt 22 5月 6 11:15 wcinput
drwxr-xr-x. 2 wt wt 88 5月 6 11:18 wcoutput
-
bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本
-
etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件
-
lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)
-
sbin目录:存放启动或停止Hadoop相关服务的脚本
-
share目录:存放Hadoop的依赖jar包、文档、和官方案例
-
data目录:实际存放hdfs文件的文件路径(hdfs文件实际存放在集群上,就在data文件夹里)
-
例如,目前集群中存在的文件存在于目录:
[wt@hadoop102 subdir0]$ pwd /opt/module/hadoop-3.1.3/data/dfs/data/current/BP-2030843570-192.168.10.102-1651821962452/current/finalized/subdir0/subdir0 [wt@hadoop102 subdir0]$ ll 总用量 272 -rw-rw-r--. 1 wt wt 23 5月 6 17:23 blk_1073741825 -rw-rw-r--. 1 wt wt 11 5月 6 17:23 blk_1073741825_1001.meta -rw-rw-r--. 1 wt wt 31 5月 6 17:24 blk_1073741832 -rw-rw-r--. 1 wt wt 11 5月 6 17:24 blk_1073741832_1008.meta -rw-rw-r--. 1 wt wt 25554 5月 6 17:24 blk_1073741834 -rw-rw-r--. 1 wt wt 207 5月 6 17:24 blk_1073741834_1010.meta -rw-rw-r--. 1 wt wt 214456 5月 6 17:24 blk_1073741835 -rw-rw-r--. 1 wt wt 1683 5月 6 17:24 blk_1073741835_1011.meta -rw-rw-r--. 1 wt wt 50 5月 8 13:35 blk_1073741836 -rw-rw-r--. 1 wt wt 11 5月 8 13:35 blk_1073741836_1012.meta [wt@hadoop102 subdir0]$ cat blk_1073741836 hello spark kafka hive flume atals zookeeper datax
-
-
logs目录:存放hadoop的文件日志
3、集群崩溃处理办法
-
第一步:杀死进程,杀死服务
[wt@hadoop102 hadoop-3.1.3]$ sbin/stop-dfs.sh
-
第二步:删除每个集群上的data和logs
[wt@hadoop102 hadoop-3.1.3]$ rm -rf data/ logs/
-
第三步:格式化
[wt@hadoop102 hadoop-3.1.3]$ hdfs namenode -format
-
第四步:启动集群
[leslie@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh
4、某个节点挂了
-
单独启动某个节点,如:
hdfs --daemon start namenode
三、maven
1、maven环境配置
-
1、下载apache-maven-3.2.2-bin.zip,并解压
-
2、配置maven的环境变量
- 系统环境:MAVEN_HOME=@MAVEN_HOME
- 系统环境:PATH中配置 %MAVEN_HOME%bin
-
3、测试是否配置成功
C:UsersHP>mvn -v Apache Maven 3.2.2 (45f7c06d68e745d05611f7fd14efb6594181933e; 2014-06-17T21:51:42+08:00) Maven home: C:mavenapache-maven-3.2.2bin.. Java version: 1.8.0_92, vendor: Oracle Corporation Java home: C:Javajdk1.8.0_92jre Default locale: zh_CN, platform encoding: GBK OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"
-
4、如果不成功,看下JAVA_HOME是否配置完全
-
比如说PATH是否添加
%JAVA_HOME%binjre;
-
是否添加CLASSPATH
.;%JAVA_HOME%lib;%JAVA_HOME%libdt.jar;%JAVA_HOME%libtools.jar
-
-
5、在C:mavenapache-maven-3.2.2conf下修改setting.xml
<!--修改镜像地址--> <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>http://maven.aliyun.com/nexus/content/groups/public</url> <mirrorOf>central</mirrorOf> </mirror> <!--修改maven仓库地址--> <localRepository>C:mavenrepository</localRepository>
2、maven插件代码
- 有下面的代码,会将需要的依赖整体打包进去
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin </artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<!--更换成自己的程序主类-->
<mainClass>com.atguigu.mr.WordcountDriver</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
注意:如果工程上显示红叉。在项目上右键->maven->update project即可。
四、windows上操作hadoop
1.windows上连接hadoop
-
1.需要在windows上配置hadoop的环境变量(需要提前准备hadoop的解压文件)
-
2.在IDEA上添加依赖
<dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> <version>2.12.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client-api</artifactId> <version>3.1.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client-runtime</artifactId> <version>3.1.3</version> </dependency> </dependencies>
-
3、编写配置文件
package com.dtdream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.*; import org.junit.Test; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import static org.apache.hadoop.shaded.com.google.gson.internal.bind.TypeAdapters.URI; //import public class HdfsClient{ @Test public void testMkdirs() throws URISyntaxException, IOException, InterruptedException { URI uri = new URI("hdfs://hadoop102:8020"); Configuration configuration = new Configuration(); String user = "wt"; FileSystem fs = FileSystem.get(uri, configuration, user); fs.mkdirs(new Path("/wt/test")); fs.close(); } @Test public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException { // 1 获取文件系统 Configuration configuration = new Configuration(); configuration.set("dfs.replication", "2"); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt"); // 2 上传文件 fs.copyFromLocalFile(new Path("C:\IDEA_test_files\JAVA_TEST02\src\main\resources\word.txt"), new Path("/wt/test/word.txt")); // 3 关闭资源 fs.close(); System.out.println("over"); } @Test public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{ // 1 获取文件系统 Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt"); // 2 执行下载操作 // boolean delSrc 指是否将原文件删除 // Path src 指要下载的文件路径 // Path dst 指将文件下载到的路径 // boolean useRawLocalFileSystem 是否开启文件校验 fs.copyToLocalFile(false, new Path("/wt/test/word.txt"), new Path("C:\IDEA_test_files\JAVA_TEST02\src\main\resources\word2.txt"), true); // 3 关闭资源 fs.close(); } @Test public void Mytest() throws URISyntaxException, IOException, InterruptedException { Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),configuration,"wt"); fs.mkdirs(new Path("/idea_test")); fs.close(); } @Test public void testDelete() throws IOException, InterruptedException, URISyntaxException{ // 1 获取文件系统 Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt"); // 2 执行删除 fs.delete(new Path("/idea_test"), true); // 3 关闭资源 fs.close(); } @Test public void testRename() throws IOException, InterruptedException, URISyntaxException{ // 1 获取文件系统 Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt"); // 2 修改文件名称 fs.rename(new Path("/wt/test/word.txt"), new Path("/wt/test/word_new.txt")); // 3 关闭资源 fs.close(); } @Test public void testListFiles() throws IOException, InterruptedException, URISyntaxException{ // 1获取文件系统 Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt"); // 2 获取文件详情 RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true); while(listFiles.hasNext()){ LocatedFileStatus status = listFiles.next(); // 输出详情 // 文件名称 System.out.println(status.getPath().getName()); // 长度 System.out.println(status.getLen()); // 权限 System.out.println(status.getPermission()); // 分组 System.out.println(status.getGroup()); // 获取存储的块信息 BlockLocation[] blockLocations = status.getBlockLocations(); for (BlockLocation blockLocation : blockLocations) { // 获取块存储的主机节点 String[] hosts = blockLocation.getHosts(); for (String host : hosts) { System.out.println(host); } } System.out.println("---------------------"); } // 3 关闭资源 fs.close(); } @Test public void testListStatus() throws IOException, InterruptedException, URISyntaxException{ // 1 获取文件配置信息 Configuration configuration = new Configuration(); FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt"); // 2 判断是文件还是文件夹 FileStatus[] listStatus = fs.listStatus(new Path("/")); for (FileStatus fileStatus : listStatus) { // 如果是文件 if (fileStatus.isFile()) { System.out.println("f:"+fileStatus.getPath().getName()); }else { System.out.println("d:"+fileStatus.getPath().getName()); } } // 3 关闭资源 fs.close(); } }
五、zookeeper
1、zookeeper安装
-
1、下载apache-zookeeper-3.5.7-bin.tar.gz,解压到集群上并配置环境变量
-
2、在conf目录下将zoo_sample.cfg文件修改为zoo.cfg
-
3、在zookeeper目录下创建zkData文件夹(用于保存zookeeper信息)
-
4、在zkData文件夹下创建myid文件并设定当前zookeeper的id
vim myid 2
-
4、修改zoo.cfg文件
#添加 #######################cluster########################## server.2=hadoop102:2888:3888 server.3=hadoop103:2888:3888 server.4=hadoop104:2888:3888
- server.A=B:C:D。
- A是一个数字,表示这个是第几号服务器;
- B是这个服务器的ip地址;
- C是这个服务器与集群中的Leader服务器交换信息的端口;
- D是万一集群中的Leader服务器挂了,需要一个端口来重新进行选举,选出一个新的Leader,而这个端口就是用来执行选举时服务器相互通信的端口。
- server.A=B:C:D。
-
5、分发zookeeper到其他服务器上
-
6、修改zkData目录下myid唯一
-
7、在服务器上分别打开
2、zookeeper群起脚本
-
1、在/home/wt下创建zk.sh
vim zk.sh #!/bin/bash for host in hadoop102 hadoop103 hadoop104 do case $1 in "start"){ echo "------------ $host zookeeper -----------" ssh $host "source /etc/profile; zkServer.sh start" };; "stop"){ echo "------------ $host zookeeper -----------" ssh $host "source /etc/profile; zkServer.sh stop" };; "status"){ echo "------------ $host zookeeper -----------" ssh $host "source /etc/profile; zkServer.sh status" };; esac done
-
2、给文本添加执行权限
chmod +x zk.sh
-
3、将文本移动到/bin/目录下以便全局调用
sudo mv zk.sh /bin/zk.sh
-
测试群起效果
[wt@hadoop102 bin]$ zk.sh start ------------ hadoop102 zookeeper ----------- ZooKeeper JMX enabled by default Using config: /opt/module/zookeeper-3.5.7/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ------------ hadoop103 zookeeper ----------- ZooKeeper JMX enabled by default Using config: /opt/module/zookeeper-3.5.7/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ------------ hadoop104 zookeeper ----------- ZooKeeper JMX enabled by default Using config: /opt/module/zookeeper-3.5.7/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [wt@hadoop102 bin]$
六、flume
1、flume安装
-
1、下载并解压apache-flume-1.7.0-bin.tar.gz包
-
2、将flume/conf下的flume-env.sh.template文件修改为flume-env.sh,并配置flume-env.sh文件
export JAVA_HOME=$JAVA_HOME
-
3、配置完了。
七、Kafka
1、kafka安装
-
1、下载并解压kafka_2.12-3.0.1.tgz包
-
2、配置kafka环境变量
-
3、在$KAFKA_HOME目录下创建logs文件夹
-
4、修改配置文件config/server.properties
输入以下内容: #broker的全局唯一编号,不能重复 broker.id=0 #删除topic功能使能 delete.topic.enable=true #处理网络请求的线程数量 num.network.threads=3 #用来处理磁盘IO的现成数量 num.io.threads=8 #发送套接字的缓冲区大小 socket.send.buffer.bytes=102400 #接收套接字的缓冲区大小 socket.receive.buffer.bytes=102400 #请求套接字的缓冲区大小 socket.request.max.bytes=104857600 #kafka运行日志存放的路径 log.dirs=/opt/module/kafka/logs #topic在当前broker上的分区个数 num.partitions=1 #用来恢复和清理data下数据的线程数量 num.recovery.threads.per.data.dir=1 #segment文件保留的最长时间,超时将被删除 log.retention.hours=168 #配置连接Zookeeper集群地址 zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka
-
5、分发zookeeper,修改各config/server.properties中broker.id唯一
-
6、在个服务器上配置kafka环境变量
-
7、测试kafka
kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
2、kafka群起脚本
- 1.在/bin/目录下创建文件kf.sh
sudo vim kf.sh
#!/bin/bash
case $1 in
"start"){
for i in hadoop102 hadoop103 hadoop104
do
echo " --------启动 $i Kafka-------"
ssh $i "/opt/module/kafka-3.0.1/bin/kafka-server-start.sh -daemon /opt/module/kafka-3.0.1/config/server.properties "
done
};;
"stop"){
for i in hadoop102 hadoop103 hadoop104
do
echo " --------停止 $i Kafka-------"
ssh $i "/opt/module/kafka-3.0.1/bin/kafka-server-stop.sh"
done
};;
esac
-
2、给文件添加执行权限
chmod 777 kf.sh
-
3、测试脚本
kf.sh start
八、mysql
1、mysql安装
-
1、解压mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar到 /opt/software/mysql下
tar -xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar -C ./mysql
-
2、卸载系统自带的mysql
#查看 rpm -qa|grep mariadb rpm -qa|grep mysql rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64
-
3、解压命令(执行顺序别乱)
sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
-
4、初始化mysql
sudo mysqld --initialize --user=mysql
-
5、查看初始化的mysql密码
[wt@hadoop102 mysql]$ sudo cat /var/log/mysqld.log 2022-05-10T05:10:39.513714Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2022-05-10T05:10:39.662586Z 0 [Warning] InnoDB: New log files created, LSN=45790 2022-05-10T05:10:39.687784Z 0 [Warning] InnoDB: Creating foreign key constraint system tables. 2022-05-10T05:10:39.746734Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 88b0625e-d01f-11ec-ac64-000c299e0128. 2022-05-10T05:10:39.748791Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened. 2022-05-10T05:10:40.608922Z 0 [Warning] CA certificate ca.pem is self signed. 2022-05-10T05:10:41.029552Z 1 [Note] A temporary password is generated for root@localhost: 1(VjXf=cIMDI
-
6、启动mysql‘服务
sudo systemctl start mysqld
-
7、进入mysql(初始化密码为:1(VjXf=cIMDI )
[wt@hadoop102 mysql]$ mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 2 Server version: 5.7.28 Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>
-
8、设置root用户的新密码,要不然会出错
mysql> set password = password("123456"); Query OK, 0 rows affected, 1 warning (0.00 sec)
-
9、修改root用户的host,允许root用户远程登录
mysql> use mysql; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select host,user from user; +-----------+---------------+ | host | user | +-----------+---------------+ | localhost | mysql.session | | localhost | mysql.sys | | localhost | root | +-----------+---------------+ 3 rows in set (0.00 sec) mysql> update user set host = '%' where user = 'root'; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select host,user from user; +-----------+---------------+ | host | user | +-----------+---------------+ | % | root | | localhost | mysql.session | | localhost | mysql.sys | +-----------+---------------+ 3 rows in set (0.00 sec)
-
刷新配置
mysql> flush privileges; Query OK, 0 rows affected (0.00 sec)
九、hive
1、hive的安装
-
1、解压apache-hive-3.1.2-bin.tar.gz到 /opt/module下并修改名称
-
2、配置hive的环境变量
vim /etc/profile.d/my_env.sh #HIVE_HOME export HIVE_HOME=/opt/module/hive-3.1.2 export PATH=$PATH:$HIVE_HOME/bin
-
3、安装mysql 并在mysql中创建数据库
create database metastore;
-
4、将mysql-connector-java-5.1.27-bin.jar驱动拷贝到hive/lib目录下
cp /opt/software/mysql-connector-java-5.1.27-bin.jar /opt/module/hive-3.1.2/lib/
-
5、在hive-3.1.2/conf目录下创建hive-site.xml文件并填写配置信息
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <!--这里metastore要与mysql中创建数据库一致--> <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <!--元数据存储授权--> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <!-- Hive 默认在 HDFS 的工作目录 --> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> </configuration>
-
6、修改/opt/module/hive-3.1.2/conf目录下的hive-env.sh.template名称为hive-env.sh
vim hive-env.sh export HADOOP_HOME=/opt/module/hadoop-3.1.3 export HIVE_CONF_DIR=/opt/module/hive-3.1.2/conf
-
7、初始化hive
schematool -initSchema -dbType mysql -verbose
- 如果在初始化的过程中报错:
- 可将hive/lib目录下的 guava.jar删除。
- 然后去/opt/module/hadoop-3.1.3/share/hadoop/common/lib 目录下将guava-27.0-jre.jar 复制到hive/lib目录下
- 如果在初始化的过程中报错:
2、使用元数据的方式连接hive
-
在hive-site.xml文件中添加
vim hive-site.xml <!--指定存储元数据要连接的地址--> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop102:9083</value> </property>
-
如果配置了元数据连接服务,就必须要启动元数据连接服务,要不然,使用hive/bin是无法使用的。
[wt@hadoop102 hive-3.1.2]$ bin/hive which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin:/home/wt/.local/bin:/home/wt/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 68c09d3a-0258-4ecc-8df2-01617ef7b28a Logging initialized using configuration in jar:file:/opt/module/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show tables; FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
-
开启元数据服务命令(打开以后对应窗口就用不了了,会停在那里)
[wt@hadoop102 hive-3.1.2]$ bin/hive --service metastore 2022-05-10 14:45:36: Starting Hive Metastore Server SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
-
使用新窗口打开hive
[wt@hadoop102 hive-3.1.2]$ bin/hive which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/home/wt/.local/bin:/home/wt/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.3/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.3/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = 66417456-280b-4197-8d1a-09ae42b82292 Logging initialized using configuration in jar:file:/opt/module/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true Hive Session ID = c677d7c2-6e76-40fd-8c89-3d060756d341 Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive>
3、使用JDBC的方式连接hive
-
1、在hive-site.xml文件里添加
<!--指定hiveserver2连接的host--> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop102</value> </property> <!--指定hiveserver2连接的端口号--> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property>
-
2、在启动JDBC连接的时候首先启动元数据连接方式
[wt@hadoop102 hive-3.1.2]$ bin/hive --service metastore 2022-05-10 16:35:48: Starting Hive Metastore Server SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
-
3、开启hiveserver2
[wt@hadoop102 hive-3.1.2]$ bin/hive --service hiveserver2 which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin:/home/wt/.local/bin:/home/wt/bin) 2022-05-10 16:38:29: Starting HiveServer2 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Hive Session ID = ddd237be-01f3-477a-a0c3-bf1b053ab258 Hive Session ID = 70777be5-1bf7-4935-8dab-572300f56ab3 Hive Session ID = d19fe2fe-1131-4bb2-88ab-be9e19abcf40 Hive Session ID = ac661263-4ed4-44a9-8d01-9aeea1068636
-
4、启动beeline客户端
[wt@hadoop102 hive-3.1.2]$ bin/beeline -u jdbc:hive2://hadoop102:10000 -n wt SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://hadoop102:10000 Connected to: Apache Hive (version 3.1.2) Driver: Hive JDBC (version 3.1.2) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 3.1.2 by Apache Hive 0: jdbc:hive2://hadoop102:10000>
-
如果因为权限问题报错,需要修改hadoop的core-site.xml文件,添加
<property> <name>hadoop.proxyuser.wt.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.wt.groups</name> <value>*</value> </property>
-
4、查看hive的日志
-
/tmp/wt/hive.log
taif -f hive.log
5、编写hiveservice2启动脚本
-
1、在使用这个脚本的时候,确保hive-site.xml中有
<!--指定存储元数据要连接的地址--> <property> <name>hive.metastore.uris</name> <value>thrift://hadoop102:9083</value> </property> <!--指定hiveserver2连接的host--> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop102</value> </property> <!--指定hiveserver2连接的端口号--> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property>
-
2、编写启动脚本(sudo vim /bin/hiveservice.sh)
#!/bin/bash HIVE_LOG_DIR=$HIVE_HOME/logs # 创建日志目录 if [ ! -d $HIVE_LOG_DIR ] then mkdir -p $HIVE_LOG_DIR fi # 检查进程是否运行正常,参数 1 为进程名,参数 2 为进程端口 function check_process() { pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}') ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1) echo $pid [[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1 } # 启动服务 function hive_start() { # 启动Metastore metapid=$(check_process HiveMetastore 9083) cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &" [ -z "$metapid" ] && eval $cmd || echo -e "