概述
目录
1.介绍:
2.下载
3.部署
伪分布式模式
1.部署jdk
2.部署hadoop
3.hdfs部署
4.ssh远程登录并执行
5.启动hdfs
7.部署yarn
8.启动yarn
9.打开RM web ui
10.启动停止命令
1.介绍:
广义:以 apache hadoop软件为主的生态圈: hive、flume、hbase、kafka、spark、flink
狭义:apache hadoop软件
hdfs 存储 海量的数据
mapreduce 计算、分析
yarn 资源和作业的调度
1.hdfs 存储 海量的数据:
namenode 负责指挥数据的存储
datanode 主要负责数据的存储
seconderynamenode 主要辅助namenode工作
2.yarn 资源和作业的调度
resourcemanager 负责指挥资源分配
nodemanager 真正的资源
2.下载
1.官网: hadoop.apache.org / project.apache.org
2.https://archive.apache.org/dist
3.部署
3.1伪分布式模式
所有进程在一台机器上运行,所有操作在hadoop用户下进行
1.部署jdk
tar -zxvf ./jdk-8u45-linux-x64.gz -C ~/app/ //解压压缩包
ln -s ./jdk1.8.0_45/ java //建立软连接 配置相关参数比较方便
//目录介绍
drwxr-xr-x. 2 hadoop hadoop 4096 Apr 11 2015 bin java相关的脚本
drwxr-xr-x. 3 hadoop hadoop 4096 Apr 11 2015 include java运行过程中需要的jar
drwxr-xr-x. 5 hadoop hadoop 4096 Apr 11 2015 jre
drwxr-xr-x. 5 hadoop hadoop 4096 Apr 11 2015 lib java运行过程中需要的jar
-rw-r--r--. 1 hadoop hadoop 21099089 Apr 11 2015 src.zip java的源码包
配置环境变量 java 里面的脚本 在当前linux任何位置都可以使用
vim ~/.bashrc
export JAVA_HOME=/home/hadoop/app/java
export PATH=${JAVA_HOME}/bin:$PATH
source ~/.bashrc
java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
2.部署hadoop
tar -zxvf ./hadoop-3.3.4.tar.gz -C ~/app/
ln -s ./hadoop-3.3.4/ hadoop
//目录介绍
drwxr-xr-x. 2 hadoop hadoop 4096 Jul 29 21:44 bin hadoop相关脚本
drwxr-xr-x. 3 hadoop hadoop 4096 Jul 29 20:35 etc hadoop配置文件
drwxr-xr-x. 2 hadoop hadoop 4096 Jul 29 21:44 include
drwxr-xr-x. 3 hadoop hadoop 4096 Jul 29 21:44 lib
drwxr-xr-x. 3 hadoop hadoop 4096 Jul 29 20:35 sbin hadoop组件启动停止脚本
drwxr-xr-x. 4 hadoop hadoop 4096 Jul 29 22:21 share hadoop相关案例
配置环境变量:
vim ~/.bashrc
#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
source ~/.bashrc
配置参数
vim hadoop-env.sh
export JAVA_HOME=/home/hadoop/app/java
3.hdfs部署
//1.core-site.xml
//fs.defaultFS 指定 namenode 所在机器
cd app/hadoop/conf
vim core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://fang02:9000</value>
</property>
//2.hdfs-site.xml
vim hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
4.ssh远程登录并执行
ssh to the localhost without a passphrase //免密登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
格式化文件系统
hdfs namenode -format
2022-11-11 22:25:33,783 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been
uccessfully formatted.
5.启动hdfs
start-dfs.sh//启动进程
检查 hdfs进程
jps/ps -ef | grep hdfs
4642 NameNode
4761 DataNode
4974 SecondaryNameNode
6.查看namenode web ui
http://fang02:9870/
http://192.168.41.12:9870/
7.部署yarn
vim mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
vim yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
8.启动yarn
start-yarn.sh
9.打开RM web ui
http://fang02:8088/
http://192.168.41.12:8088/
10.启动停止命令
satar-all.sh //启动dadoop
stop-all.sh //停止hadoop
3.2 完全分布式
1.集群划分
hdfs:
namenode nn
datanode dn
seconderynamenode snn
yarn :
resourcemanager rm
nodemanager nmbigdata32 : nn dn nm
bigdata33 : dn rm nm
bigdata34 : snn dn nm
2.准备机器
3台 4G 2cpu 40G克隆机器 修改:
(1) ip vim /etc/sysconfig/network-scripts/ifcfg-ens33
(2) hostname vim /etc/hostname
(3) ip映射 vim /etc/hosts
3.ssh 免密登录【三台机器都要做】
[hadoop@bigdata32 ~]$ mkdir app software data shell project
[hadoop@bigdata32 ~]$ ssh-keygen -t rsa
//拷贝公钥 【三台机器都要做】
ssh-copy-id bigdata32
ssh-copy-id bigdata33
ssh-copy-id bigdata34
4 jdk 部署【三台机器都要做】
//1.scp:
scp [[user@]host1:]file1 ... [[user@]host2:]file2
scp bigdata32:~/1.log bigdata33:~
//2.rsync:
rsync [OPTION]... SRC [SRC]... [USER@]HOST:DEST
rsync ~/1.log bigdata34:~
bigdata32:~/1.log: 文件内容发生更新
rsync -av ~/1.log bigdata34:~
5.编写文件同步脚本
#!/bin/bash
#三台机器 进行文件发放
if [ $# -lt 1 ];then
echo "参数不足"
echo "eg:$0 filename..."
fi
#遍历发送文件到 三台机器
for host in bigdata32 bigdata33 bigdata34
do
echo "=============$host=================="
#1.遍历发送文件的目录
for file in $@
do
#2.判断文件是否存在
if [ -e ${file} ];then
pathdir=$(cd $(dirname ${file});pwd)
filename=$(basename ${file})
#3.同步文件
ssh $host "mkdir -p $pathdir"
rsync -av $pathdir/$filename $host:$pathdir
else
echo "${file} 不存在"
fi
done
done
给脚本配置环境变量:
vim ~/.bashrc
export SHELL_HOME=/home/hadoop/shell
export PATH=${PATH}:${SHELL_HOME}
source ~/.bashrc
6.jdk 部署【三台机器都要安装】
//1.bigdata32 先安装jdk
[hadoop@bigdata32 software]$ tar -zxvf jdk-8u45-linux-x64.gz -C ~/app/
[hadoop@bigdata32 app]$ ln -s jdk1.8.0_45/ java
[hadoop@bigdata32 app]$ vim ~/.bashrc
#JAVA_HOME
export JAVA_HOME=/home/hadoop/app/java
export PATH=${PATH}:${JAVA_HOME}/bin
[hadoop@bigdata32 app]$ which java
~/app/java/bin/java
[hadoop@bigdata32 app]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode
[hadoop@bigdata32 app]$ xsync java/
[hadoop@bigdata32 app]$ xsync jdk1.8.0_45
[hadoop@bigdata32 app]$ xsync ~/.bashrc
//三台机器 source ~/.bashrc
7.部署hadoop
bigdata32 : nn dn nm
bigdata33 : dn rm nm
bigdata34 : snn dn nm
[hadoop@bigdata32 software]$ tar -zxvf hadoop-3.3.4.tar.gz -C ~/app/
[hadoop@bigdata32 app]$ ln -s hadoop-3.3.4/ hadoop
[hadoop@bigdata32 app]$ vim ~/.bashrc
#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
[hadoop@bigdata32 app]$ source ~/.bashrc
[hadoop@bigdata32 app]$ which hadoop
~/app/hadoop/bin/hadoop
//【三台机器一起做】
[hadoop@bigdata32 hadoop]$ pwd
/home/hadoop/data/hadoop
[hadoop@bigdata32 data]$ mkdir hadoop
8.配置hdfs
vim core-site.xml:
<property>
<name>fs.defaultFS</name>
<value>hdfs://bigdata32:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/data/hadoop</value>
</property>
vim hdfs-site.xml:
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>bigdata34:9868</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>bigdata34:9869</value>
</property>
[hadoop@bigdata32 hadoop]$ pwd
/home/hadoop/app/hadoop/etc/hadoop
[hadoop@bigdata32 hadoop]$ cat workers
bigdata32
bigdata33
bigdata34
//同步bigdata32内容 到bigdata33 bigdata34
[hadoop@bigdata32 app]$ xsync hadoop
[hadoop@bigdata32 app]$ xsync hadoop-3.3.4
[hadoop@bigdata32 app]$ xsync ~/.bashrc
//三台机器都要做souce ~/.bashrc
//格式化:格式化操作 部署时候做一次即可 namenode在哪就在哪台机器格式化
[hadoop@bigdata32 app]$hdfs namenode -format
//启动hdfs:
start-dfs.sh //namenode在哪 就在哪启动
访问namenode web ui: http://bigdata32:9870/
9.配置yarn
//先配置bigdata32 + 同步
vim mapred-site.xml:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
vim yarn-site.xml:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata33</value>
</property>
//bigdata32机器 配置文件分发到bigdata33 34:
[hadoop@bigdata32 app]$ xsync hadoop-3.3.4
//启动yarn:
start-yarn.sh //resourcemanager在哪 就在哪启动
访问RM web ui:bigdata33:8088
3.3启动停止hadoop
1.伪分布式
hdfs: start-dfs.sh
yarn: start-yarn.sh
start-all.sh //启动hadoop
stop-all.sh //关闭hadoop
2.完全分布式
编写一个 群起脚本:
[hadoop@bigdata32 ~]$ vim shell/hadoop-cluster
#!/bin/bash
if [ $# -lt 1 ];then
echo "Usage:$0 start|stop"
exit
fi
case $1 in
"start")
echo "========启动hadoop集群========"
echo "========启动 hdfs========"
ssh bigdata32 "/home/hadoop/app/hadoop/sbin/start-dfs.sh"
echo "========启动 yarn========"
ssh bigdata33 "/home/hadoop/app/hadoop/sbin/start-yarn.sh"
;;
"stop")
echo "========停止hadoop集群========"
echo "========停止 yarn========"
ssh bigdata33 "/home/hadoop/app/hadoop/sbin/stop-yarn.sh"
echo "========停止 hdfs========"
ssh bigdata32 "/home/hadoop/app/hadoop/sbin/stop-dfs.sh"
;;
*)
echo "Usage:$0 start|stop"
;;
esac
编写查看 java 进程的脚本
[hadoop@bigdata32 ~]$ vim shell/jpsall
for host in bigdata32 bigdata33 bigdata34
do
echo "==========$host========="
ssh $host "/home/hadoop/app/java/bin/jps| grep -v Jps"
done
最后
以上就是俭朴嚓茶为你收集整理的Hadoop(一)1.介绍:2.下载3.部署的全部内容,希望文章能够帮你解决Hadoop(一)1.介绍:2.下载3.部署所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复