OutputFormat输出过程的学习

59 阅读 0 评论 39 点赞

我是靠谱客的博主舒服小懒虫，最近开发中收集的这篇文章主要介绍OutputFormat输出过程的学习，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

花了大约1周的时间，终于把MapReduce的5大阶段的源码学习结束掉了，收获不少，就算本人对Hadoop学习的一个里程碑式的纪念吧。今天花了一点点的时间，把MapReduce的最后一个阶段，输出OutputFormat给做了分析，这个过程跟InputFormat刚刚好是对着干的，二者极具对称性。为什么这么说呢，待我一一分析。

OutputFormat过程的作用就是定义数据key-value的输出格式，给你处理好后的数据，究竟以什么样的形式输出呢，才能让下次别人拿到这个文件的时候能准确的提取出里面的数据。这里，我们撇开这个话题，仅仅我知道的一些定义的数据格式的方法，比如在Redis中会有这样的设计:

[key-length][key][value-length][value][key-length][key][value-length][value]...

或者说不一定非要省空间,直接搞过分隔符

[key] [value]n

.....

这样逐行读取，再以空格隔开，取出里面的键值对，这么做简单是简单，就是不紧凑，空间浪费得有点多。在MapReduce的OutputFormat的有种格式用的就是这种方式。

首先必须得了解OutputFormat里面到底有什么东西:

public interface OutputFormat<K, V> {

  /** 
   * Get the {@link RecordWriter} for the given job.
   * 获取输出记录键值记录
   *
   * @param ignored
   * @param job configuration for the job whose output is being written.
   * @param name the unique name for this part of the output.
   * @param progress mechanism for reporting progress while writing to file.
   * @return a {@link RecordWriter} to write the output for the job.
   * @throws IOException
   */
  RecordWriter<K, V> getRecordWriter(FileSystem ignored, JobConf job,
                                     String name, Progressable progress)
  throws IOException;

  /** 
   * Check for validity of the output-specification for the job.
   *  
   * <p>This is to validate the output specification for the job when it is
   * a job is submitted.  Typically checks that it does not already exist,
   * throwing an exception when it already exists, so that output is not
   * overwritten.</p>
   * 作业运行之前进行的检测工作，例如配置的输出目录是否存在等
   *
   * @param ignored
   * @param job job configuration.
   * @throws IOException when output should not be attempted
   */
  void checkOutputSpecs(FileSystem ignored, JobConf job) throws IOException;
}

很简单的2个方法，RecordWriter比较重要，后面的key-value的写入操作都是根据他来完成的。但是他是一个接口，在MapReduce中，我们用的最多的他的子类是FileOutputFormat：

/** A base class for {@link OutputFormat}. */
public abstract class File

最后

以上就是舒服小懒虫为你收集整理的OutputFormat输出过程的学习的全部内容，希望文章能够帮你解决OutputFormat输出过程的学习所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：Hadoop
浏览次数：59 次浏览
发布日期：2024-06-25 00:25:01
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_7_o_22_f2_14_j_14_2.html

OutputFormat输出过程的学习

概述

最后

评论列表共有 0 条评论

发表评论取消回复

OutputFormat输出过程的学习

概述

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复