概述
centos7安装hadoop3.0.3和jdk1.8的伪分布式模式
添加普通用户hadoop
useradd hadoop
passwd hadoop
1
给hadoop用户sudo权限
chmod u+w /etc/sudoers
vi /etc/sudoers
添加
hadoop ALL=(ALL) ALL
或者
hadoop ALL=(root) NOPASSWD:ALL
切换到hadoop用户
su - hadoop
安装到/home/hadoop/hadoop3.03目录
sudo mkidr /home/hadoop/hadoop3.03
tar -zxvf hadoop-3.0.3.tar.gz
mv hadoop-3.0.3 hadoop3.03
安装到/home/hadoop/java/jdk1.8
tar -zxvf jdk-8u172-linux-x64.gz
mv jdk_1.8.0.172 jdk1.8
配置环境变量
vi /etc/profile
##java
export JAVA_HOME=/home/hadoop/java/jdk1.8
export PATH=$PATH:$JAVA_HOME/bin
##hadoop
export HADOOP_HOME=/home/hadoop/hadoop3.03
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
验证
echo $JAVA_HOME
echo $HADOOP_HOME
配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME参数
export JAVA_HOME=/home/hadoop/java/jdk1.8
配置core-site.xml
hadoop-localhost为主机名称,/opt/data/tmp要先创建好目录
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-localhost:8020</value>
<description>HDFS的URI,文件系统://namenode标识:端口号</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/data/tmp</value>
<description>namenode上本地的hadoop临时文件夹</description>
</property>
</configuration>
hadoop.tmp.dir配置的是Hadoop临时目录,比如HDFS的NameNode数据默认都存放这个目录下,查看*-default.xml等默认配置文件,就可以看到很多依赖${hadoop.tmp.dir}的配置。
默认的hadoop.tmp.dir是/tmp/hadoop-${user.name},此时有个问题就是NameNode会将HDFS的元数据存储在这个/tmp目录下,如果操作系统重启了,系统会清空/tmp目录下的东西,导致NameNode元数据丢失,是个非常严重的问题,所有我们应该修改这个路径。
sudo mkdir -p /opt/data/tmp
将临时目录的所有者修改为hadoop
sudo chown –R hadoop:hadoop /opt/data/tmp
配置hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/opt/data/tmp/dfs/name</value>
<description>namenode上存储hdfs名字空间元数据</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/data/tmp/dfs/data</value>
<description>datanode上数据块的物理存储位置</description>
</property>
<!--设置hdfs副本数量-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
格式化HDFS
sudo chown -R hadoop:hadoop /opt/data
hdfs namenode –format
查看NameNode格式化后的目录
$ ll /opt/data/tmp/dfs/name/current
启动NameNode
sbin/hadoop-daemon.sh start namenode
启动DataNode
sbin/hadoop-daemon.sh start datanode
启动SecondaryNameNode
sbin/hadoop-daemon.sh start secondarynamenode
JPS命令查看是否已经启动成功,有结果就是启动成功了
$ jps
HDFS上测试创建目录、上传、下载文件
[hadoop@hadoop-localhost hadoop3.03]#
创建目录
bin/hdfs dfs -mkdir /demo1
上传
bin/hdfs dfs -put etc/hadoop/core-site.xml /demo1
读取HDFS上的文件内容
bin/hdfs dfs -cat /demo1/core-site.xml
从HDFS上下载文件到本地
bin/hdfs dfs -get /demo1/core-site.xml
查看hdfs的web页面
hdfs 2.X版本的web页面端口号为50070
http://192.168.145.129:50070
hdfs 3.X版本的web页面端口号为9870
http://192.168.145.129:9870/dfshealth.html#tab-overview
配置、启动YARN
配置mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<!-- 指定mr运行在yarn上 -->
<!-- ${full path of your hadoop distribution directory} -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/hadoop/hadoop3.03</value>
</property>
</configuration>
配置yarn-site.xml
arn.nodemanager.aux-services配置了yarn的默认混洗方式,选择为mapreduce的默认混洗算法。
yarn.resourcemanager.hostname指定了Resourcemanager运行在哪个节点上。
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-localhost</value>
</property>
</configuration>
启动Resourcemanager
sbin/yarn-daemon.sh start resourcemanager
启动nodemanager
sbin/yarn-daemon.sh start nodemanager
也可执行批处理文件启动服务
启动hdfs 和yarn
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/start-all.sh
YARN的Web页面
YARN的Web客户端端口号是8088,通过http://192.168.145.129:8088/可以查看。
运行MapReduce Job
创建测试用的Input文件
bin/hdfs dfs -mkdir -p /wordcountdemo/input
wc.input文件内容为:
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
将wc.input文件上传到HDFS的/wordcountdemo/input目录中:
bin/hdfs dfs -put /opt/data/wc.input /wordcountdemo/input
运行WordCount MapReduce Job
[hadoop@hadoop-localhost hadoop3.03]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar wordcount /wordcountdemo/input /wordcountdemo/output
2018-07-03 19:38:23,956 INFO client.RMProxy: Connecting to ResourceManager at hadoop-localhost/192.168.145.129:8032
2018-07-03 19:38:24,565 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1530615244194_0002
2018-07-03 19:38:24,879 INFO input.FileInputFormat: Total input files to process : 1
2018-07-03 19:38:25,784 INFO mapreduce.JobSubmitter: number of splits:1
2018-07-03 19:38:25,841 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-07-03 19:38:26,314 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1530615244194_0002
2018-07-03 19:38:26,315 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-07-03 19:38:26,466 INFO conf.Configuration: resource-types.xml not found
2018-07-03 19:38:26,466 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-07-03 19:38:26,547 INFO impl.YarnClientImpl: Submitted application application_1530615244194_0002
2018-07-03 19:38:26,590 INFO mapreduce.Job: The url to track the job: http://hadoop-localhost:8088/proxy/application_1530615244194_0002/
2018-07-03 19:38:26,590 INFO mapreduce.Job: Running job: job_1530615244194_0002
2018-07-03 19:38:35,985 INFO mapreduce.Job: Job job_1530615244194_0002 running in uber mode : false
2018-07-03 19:38:35,988 INFO mapreduce.Job: map 0% reduce 0%
2018-07-03 19:38:42,310 INFO mapreduce.Job: map 100% reduce 0%
2018-07-03 19:38:47,402 INFO mapreduce.Job: map 100% reduce 100%
2018-07-03 19:38:49,469 INFO mapreduce.Job: Job job_1530615244194_0002 completed successfully
2018-07-03 19:38:49,579 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=94
FILE: Number of bytes written=403931
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=195
HDFS: Number of bytes written=60
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4573
Total time spent by all reduces in occupied slots (ms)=2981
Total time spent by all map tasks (ms)=4573
Total time spent by all reduce tasks (ms)=2981
Total vcore-milliseconds taken by all map tasks=4573
Total vcore-milliseconds taken by all reduce tasks=2981
Total megabyte-milliseconds taken by all map tasks=4682752
Total megabyte-milliseconds taken by all reduce tasks=3052544
Map-Reduce Framework
Map input records=4
Map output records=11
Map output bytes=115
Map output materialized bytes=94
Input split bytes=122
Combine input records=11
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=94
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=171
CPU time spent (ms)=1630
Physical memory (bytes) snapshot=332750848
Virtual memory (bytes) snapshot=5473169408
Total committed heap usage (bytes)=165810176
Peak Map Physical memory (bytes)=214093824
Peak Map Virtual memory (bytes)=2733207552
Peak Reduce Physical memory (bytes)=118657024
Peak Reduce Virtual memory (bytes)=2739961856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=73
File Output Format Counters
Bytes Written=60
[hadoop@hadoop-localhost hadoop3.03]$
输出统计结果为:
[hadoop@hadoop-localhost hadoop3.03]$ bin/hdfs dfs -cat /wordcountdemo/output/part-r-00000
hadoop 3
hbase 1
hive 2
mapreduce 1
spark 2
sqoop 1
storm 1
[hadoop@hadoop-localhost hadoop3.03]$
结果是按照键值排好序的
停止Hadoop
sbin/hadoop-daemon.sh stop namenode
sbin/hadoop-daemon.sh stop datanode
sbin/yarn-daemon.sh stop resourcemanager
sbin/yarn-daemon.sh stop nodemanager
全部停止批处理文件
sbin/stop_yarn.sh
sbin/stop_dfs.sh
sbin/stop_all.sh
HDFS模块简介
HDFS负责大数据的存储,通过将大文件分块后进行分布式存储方式,突破了服务器硬盘大小的限制,解决了单台机器无法存储大文件的问题,HDFS是个相对独立的模块,可以为YARN提供服务,也可以为HBase等其他模块提供服务。
YARN模块简介
YARN是一个通用的资源协同和任务调度框架,是为了解决Hadoop1.x中MapReduce里NameNode负载太大和其他问题而创建的一个框架。
YARN是个通用框架,不止可以运行MapReduce,还可以运行Spark、Storm等其他计算框架。
MapReduce模块简介
MapReduce是一个计算框架,它给出了一种数据处理的方式,即通过Map阶段、Reduce阶段来分布式地流式处理数据。它只适用于大数据的离线处理,对实时性要求很高的应用不适用。
—-the—–end—-
最后
以上就是清爽野狼为你收集整理的centos7安装hadoop3.0.3和jdk1.8的伪分布式模式添加普通用户hadoop给hadoop用户sudo权限切换到hadoop用户安装到/home/hadoop/hadoop3.03目录安装到/home/hadoop/java/jdk1.8配置环境变量验证配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME参数配置core-site.xml配置hdfs-site.xml格式化HDFSHDFS上测试创建目录、上传、下载文件查看h的全部内容,希望文章能够帮你解决centos7安装hadoop3.0.3和jdk1.8的伪分布式模式添加普通用户hadoop给hadoop用户sudo权限切换到hadoop用户安装到/home/hadoop/hadoop3.03目录安装到/home/hadoop/java/jdk1.8配置环境变量验证配置 hadoop-env.sh、mapred-env.sh、yarn-env.sh文件的JAVA_HOME参数配置core-site.xml配置hdfs-site.xml格式化HDFSHDFS上测试创建目录、上传、下载文件查看h所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复