Hadoop集群模式下运行Mapreduce任务

68 阅读 0 评论 45 点赞

我是靠谱客的博主安详乌冬面，最近开发中收集的这篇文章主要介绍Hadoop集群模式下运行Mapreduce任务，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

写了一个Hadoop权威指南中MapReduce处理天气数据的Demo

一.MapReduce执行过程

这里写图片描述

map前
这里写图片描述

map后
这里写图片描述

mapreduce流程图

这里写图片描述

二.编写Mapper和Reducer类

MaxTemperatureMapper
MaxTemperatureReducer

//mapper
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+')
{ // parseInt doesn't like leading plus signs

airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
//Reducer
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text keyin, Iterable<IntWritable> valuein, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : valuein) {
maxValue = Math.max(maxValue, value.get());
}
context.write(keyin, new IntWritable(maxValue));
}
}

编写App类

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxTemperature {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job = new Job();
job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); 
System.out.println(job.waitForCompletion(true));

}
}

三.在HDFS集群上执行程序

将写好的hadoopDemo打成jar包上传到ubuntu主机

root@s0:/mnt/hgfs/Host2VMmare# cp hadoopDemo.jar ~/Downloads/

创建hdfs目录ncdc_data存放待处理数据

root@s0:/mnt/hgfs/Host2VMmare# hadoop fs -mkdir -p /user/root/ncdc_data/

将待处理数据拷贝到ncdc_da目录下ta

root@s0:/mnt/hgfs/Host2VMmare# hadoop fs -put 19*.gz /user/root/ncdc_data/

查看HDFS目录结构，看到数据拷贝成功

root@s0:/mnt/hgfs/Host2VMmare# hadoop fs -ls -R /
drwxr-xr-x
- root supergroup
0 2017-04-18 11:56 /d
drwxr-xr-x
- root supergroup
0 2017-04-19 00:42 /user
drwxr-xr-x
- root supergroup
0 2017-04-19 00:44 /user/root
drwxr-xr-x
- root supergroup
0 2017-04-19 00:46 /user/root/ncdc_data
-rw-r--r--
2 root supergroup
73867 2017-04-19 00:46 /user/root/ncdc_data/1901.gz
-rw-r--r--
2 root supergroup
74105 2017-04-19 00:46 /user/root/ncdc_data/1902.gz

最后运行程序(其中out目录用于存放输出结果)

root@s0:~/Downloads# hadoop jar hadoopDemo.jar /user/root/ncdc_data/
/user/root/out
17/04/19 01:00:01 INFO client.RMProxy: Connecting to ResourceManager at s0/192.168.190.131:8032
17/04/19 01:00:03 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/04/19 01:00:07 INFO input.FileInputFormat: Total input paths to process : 2
17/04/19 01:00:07 INFO mapreduce.JobSubmitter: number of splits:2
17/04/19 01:00:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492587464114_0001
17/04/19 01:00:09 INFO impl.YarnClientImpl: Submitted application application_1492587464114_0001
17/04/19 01:00:09 INFO mapreduce.Job: The url to track the job: http://s0:8088/proxy/application_1492587464114_0001/
17/04/19 01:00:09 INFO mapreduce.Job: Running job: job_1492587464114_0001
17/04/19 01:00:28 INFO mapreduce.Job: Job job_1492587464114_0001 running in uber mode : false
17/04/19 01:00:28 INFO mapreduce.Job:
map 0% reduce 0%
17/04/19 01:00:53 INFO mapreduce.Job:
map 100% reduce 0%
17/04/19 01:01:11 INFO mapreduce.Job:
map 100% reduce 100%
17/04/19 01:01:13 INFO mapreduce.Job: Job job_1492587464114_0001 completed successfully
17/04/19 01:01:13 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=144425
FILE: Number of bytes written=643530
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=148176
HDFS: Number of bytes written=18
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=45494
Total time spent by all reduces in occupied slots (ms)=13941
Total time spent by all map tasks (ms)=45494
Total time spent by all reduce tasks (ms)=13941
Total vcore-milliseconds taken by all map tasks=45494
Total vcore-milliseconds taken by all reduce tasks=13941
Total megabyte-milliseconds taken by all map tasks=46585856
Total megabyte-milliseconds taken by all reduce tasks=14275584
Map-Reduce Framework
Map input records=13130
Map output records=13129
Map output bytes=118161
Map output materialized bytes=144431
Input split bytes=204
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=144431
Reduce input records=13129
Reduce output records=2
Spilled Records=26258
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=969
CPU time spent (ms)=4670
Physical memory (bytes) snapshot=292323328
Virtual memory (bytes) snapshot=5676564480
Total committed heap usage (bytes)=259633152
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=147972
File Output Format Counters
Bytes Written=18
true

此时再次查看HDFS目录结构，可以看到out目录下的输出文件

root@s0:~/Downloads#
hadoop fs -ls -R /
drwxr-xr-x
- root supergroup
0 2017-04-18 11:56 /d
drwx------
- root supergroup
0 2017-04-19 01:00 /tmp
drwx------
- root supergroup
0 2017-04-19 01:00 /tmp/hadoop-yarn
drwx------
- root supergroup
0 2017-04-19 01:00 /tmp/hadoop-yarn/staging
drwxr-xr-x
- root supergroup
0 2017-04-19 01:00 /tmp/hadoop-yarn/staging/history
drwxrwxrwt
- root supergroup
0 2017-04-19 01:00 /tmp/hadoop-yarn/staging/history/done_intermediate
drwxrwx---
- root supergroup
0 2017-04-19 01:01 /tmp/hadoop-yarn/staging/history/done_intermediate/root
-rwxrwx---
2 root supergroup
39928 2017-04-19 01:01 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1492587464114_0001-1492588808404-root-Max+temperature-1492588870701-2-1-SUCCEEDED-default-1492588827238.jhist
-rwxrwx---
2 root supergroup
354 2017-04-19 01:01 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1492587464114_0001.summary
-rwxrwx---
2 root supergroup
116606 2017-04-19 01:01 /tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1492587464114_0001_conf.xml
drwx------
- root supergroup
0 2017-04-19 01:00 /tmp/hadoop-yarn/staging/root
drwx------
- root supergroup
0 2017-04-19 01:01 /tmp/hadoop-yarn/staging/root/.staging
drwxr-xr-x
- root supergroup
0 2017-04-19 00:42 /user
drwxr-xr-x
- root supergroup
0 2017-04-19 01:00 /user/root
drwxr-xr-x
- root supergroup
0 2017-04-19 00:46 /user/root/ncdc_data
-rw-r--r--
2 root supergroup
73867 2017-04-19 00:46 /user/root/ncdc_data/1901.gz
-rw-r--r--
2 root supergroup
74105 2017-04-19 00:46 /user/root/ncdc_data/1902.gz
drwxr-xr-x
- root supergroup
0 2017-04-19 01:01 /user/root/out
**-rw-r--r--
2 root supergroup
0 2017-04-19 01:01 /user/root/out/_SUCCESS
-rw-r--r--
2 root supergroup
18 2017-04-19 01:01 /user/root/out/part-r-00000**

查看输出文件 part-r-00000**

root@s0:~/Downloads# hadoop fs -cat /user/root/out/part-r*
1901
317
1902
244

最后

以上就是安详乌冬面为你收集整理的Hadoop集群模式下运行Mapreduce任务的全部内容，希望文章能够帮你解决Hadoop集群模式下运行Mapreduce任务所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：hadoop
浏览次数：68 次浏览
发布日期：2024-01-11 06:36:13
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_ogf0_14__7__18_y.html

Hadoop集群模式下运行Mapreduce任务

概述

一.MapReduce执行过程

二.编写Mapper和Reducer类

三.在HDFS集群上执行程序

最后

评论列表共有 0 条评论

发表评论取消回复

Hadoop集群模式下运行Mapreduce任务

概述

一.MapReduce执行过程

二.编写Mapper和Reducer类

三.在HDFS集群上执行程序

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复