Mapreduce实现Wordcount一、程序实现二、操作实例

112 阅读 0 评论 74 点赞

我是靠谱客的博主精明乐曲，最近开发中收集的这篇文章主要介绍Mapreduce实现Wordcount一、程序实现二、操作实例，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

Mapreduce实现Wordcount

一、程序实现
- 1.1 mapper类：
- 1.2 reducer类：
- 1.3 main类：
二、操作实例
- 2.1 打包
- 2.2 数据操作

一、程序实现

1.1 mapper类：

 // Mapper的四个参数：第一个Object表示输入key的类型；第二个Text表示输入value的类型；第三个Text表示表示输出键的类型；第四个IntWritable表示输出值的类型。
public static class doMapper extends Mapper<Object, Text, Text, IntWritable> {
public static final IntWritable one = new IntWritable(1);
public static final Text word = new Text();
// map参数<keyIn key,valueIn value,Context context>，将处理后的数据写入context并传给reduce
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.tostring();
String []arr = line.split("t");
for(String wd : arr){
word.set(wd);
context.write(word,val);
//把word存到容器中，计数
}
}
}

1.2 reducer类：

public class WordReducerReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable val = new IntWritable();
protected void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
val.set(sum);
context.write(key, result);//将结果保存到context中，最终输出形式为"key" + "result"
}
}

1.3 main类：

public class WordCount {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
String input = null;
String output = null;
if(null != args && args.length == 2){
input = args[0];
output = args[1];
Job job = new Job(new Configuration(),"word count");//创建一个job
//以jar包的形式运行
job.setJarByClass(WordCount.class);
//设置Mapper类和Reducer类
job.setMapperClass(Mapper.class);
job.setReducerClass(Reducer.class);
//设置输出的key/value的输出数据类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
//设置输入输出目录
FileInputFormat.addInputPath(job,new Path(intput));
FileOutputFormat.setOutputPath(job,new Path(Output));
//提交运行job
System.exit(job.waitForCompletion(true) ? 0 : 1);
}else{
System.err.println("<Urage> wordcount<intput> <output>")
}
}
}

二、操作实例

2.1 打包

经过本地测试后，将程序打包打包 -> wordcount.jar。(exports => jar file选项)

2.2 数据操作

vim file.txt
hadoop fs -put file_a.txt
//将数据文件上传带到HDFS文件系统的根目录下
hadoop jar wordcount.jar /file_a.txt /wordcount_output
//运行输出目录为wordcount_output
hadoop fs -ls /wordcount_output
//查看目录中文件，再cat查看文件内容