大数据（010）Hadoop-第一次使用eclipse&java&hadoop分析简单数据

153 阅读 0 评论 101 点赞

我是靠谱客的博主羞涩百合，这篇文章主要介绍大数据（010）Hadoop-第一次使用eclipse&java&hadoop分析简单数据，现在分享给大家，希望可以做个参考。

源码：http://download.csdn.net/detail/jintaohahahaha/9919467

一、打开eclipse

二、新建java项目mapreducer

三、项目下新建lib文件夹，导入hadoop相关jar，jar在源码中有

四、项目下建包，写如下三个类

1、WorldCountMapper.java

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package com.zjt.mapreducer.data;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.util.StringUtils;

/**
 * 执行统计单词数量的map程序
 * @author ZhangJintao
 * 		Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
 * 			KEYIN ----   输入数据的键
 * 			VALUEIN ----  输入数据的值
 * 			KEYOUT ---- 输出数据的键
 * 			VALUEOUT ----  输出数据的值
 */
public class WorldCounteMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
	@Override
	/**
	 * 父类的map方法，循环调用
	 * 从split碎片段中每行调用一次
	 * 把该行所在下标为key，该行的值为value
	 * 【功能：将单词以map输出】
	 */
	protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
			throws IOException, InterruptedException {
		String[] words = StringUtils.split(value.toString(), ' ');
		for (String w : words) {
			context.write(new Text(w), new IntWritable(1));
		}
	}
}

2、WorldCountReducer.java

复制代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package com.zjt.mapreducer.data;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WorldCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	/**
	 * 循环调用
	 * 洗牌完毕分好组后，每组调用一次
	 * 【功能：计算单词出现次数】
	 */
	protected void reduce(Text arg0, Iterable<IntWritable> arg1,
			Reducer<Text, IntWritable, Text, IntWritable>.Context arg2) throws IOException, InterruptedException {
		int sum = 0 ;
		for(IntWritable i : arg1){
			sum += i.get();
		}
		arg2.write(arg0, new IntWritable(sum));
	}
}

3、RunJob.java

复制代码

package com.zjt.mapreducer.data;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 执行方法
 * @author ZhangJintao
 */
public class RunJob {
	public static void main(String[] args) {
		Configuration config = new Configuration();
		try {
			FileSystem fs = FileSystem.get(config);
			
			Job job = Job.getInstance();
			//执行mapreducer任务
			job.setJarByClass(RunJob.class);
			job.setJobName("data");
			
			job.setMapperClass(WorldCounteMapper.class);
			job.setReducerClass(WorldCountReducer.class);
			
			job.setMapOutputKeyClass(Text.class);
			job.setMapOutputValueClass(IntWritable.class);
			
			FileInputFormat.addInputPath(job, new Path("/usr/input/"));
			Path outpath  = new Path("/usr/input/data");
			if (fs.exists(outpath)) {
				fs.delete(outpath, true);
			}
			FileOutputFormat.setOutputPath(job, outpath);
			
			boolean f = job.waitForCompletion(true);
			
			if (f) {
				System.out.println("JOB 执行成功");
			}
		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
package com.zjt.mapreducer.data;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 执行方法
 * @author ZhangJintao
 */
public class RunJob {
	public static void main(String[] args) {
		Configuration config = new Configuration();
		try {
			FileSystem fs = FileSystem.get(config);
			
			Job job = Job.getInstance();
			//执行mapreducer任务
			job.setJarByClass(RunJob.class);
			job.setJobName("data");
			
			job.setMapperClass(WorldCounteMapper.class);
			job.setReducerClass(WorldCountReducer.class);
			
			job.setMapOutputKeyClass(Text.class);
			job.setMapOutputValueClass(IntWritable.class);
			
			FileInputFormat.addInputPath(job, new Path("/usr/input/"));
			Path outpath  = new Path("/usr/input/data");
			if (fs.exists(outpath)) {
				fs.delete(outpath, true);
			}
			FileOutputFormat.setOutputPath(job, outpath);
			
			boolean f = job.waitForCompletion(true);
			
			if (f) {
				System.out.println("JOB 执行成功");
			}
		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

五、打jar包

六、上传测试数据

其中data.txt内容如下

复制代码

1
2
3
hadoop hello world
hello hadoop
hbase zookeeper

六、运行程序

将我们打的jar上传至任何一台节点主机，远程登录节点主机，进入jar包所在目录，执行如下命令

复制代码

1
    hadoop jar wc.jar com.zjt.mapreducer.data.RunJob

执行后，我们可以看到控制台会提示信息。

我们进入网页http://192.168.1.201:8088/cluster/apps后点击 Applications可以看到下图所示

执行完毕之后，在此进入eclipse

刷新后会发现多了如下几个文件夹和文件

通过结果文件可知，她帮我们统计出了data.txt中hadoop单词有两个、hbase有一个、hello有两个、world有一个、zookeeper有一个

最后

以上就是羞涩百合最近收集整理的关于大数据（010）Hadoop-第一次使用eclipse&java&hadoop分析简单数据的全部内容，更多相关大数据内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：集群、全文检索、大数据（Hadoop）
浏览次数：153 次浏览
发布日期：2023-10-20 06:16:39
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_26_f5_13__7__22_z.html

大数据（010）Hadoop-第一次使用eclipse&java&hadoop分析简单数据

最后

评论列表共有 0 条评论

发表评论取消回复

大数据 （010）Hadoop-第一次使用eclipse&java&hadoop分析简单数据

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

微信扫一扫：分享

大数据（010）Hadoop-第一次使用eclipse&java&hadoop分析简单数据

发表评论取消回复