hive执行计划分析（join详解）

206 阅读 0 评论 136 点赞

我是靠谱客的博主粗犷抽屉，这篇文章主要介绍hive执行计划分析（join详解），现在分享给大家，希望可以做个参考。

我们都知道执行的hive sql是需要编译成MapReduce任务去执行的，那是如何编译的呢，可以分为六个阶段：

Antlr定义SQL的语法规则，完成SQL词法、语法解析，将SQL转化为抽象语法树AST Tree
遍历AST Tree，抽象出查询的基本组成单元QueryBlock
遍历QueryBlock，翻译为执行操作树OperatorTree
逻辑层优化器进行OperatorTree变换，合并不必要的ReduceSinkOperator，减少shuffle数据量
遍历OperatorTree，翻译为MapReduce任务
物理层优化器进行MapReduce任务的变换，生成最终的执行计划

通过六个阶段得到了执行计划，通过执行计划我们可以清楚的知道，sql的执行顺序以及执行过程，这样有助于我们对底层的理解以及对代码的优化，提高执行效率

下面我们来看看怎么查看执行计划：

语法：

 EXPLAIN [EXTENDED|CBO|AST|DEPENDENCY|AUTHORIZATION|LOCKS|VECTORIZATION|ANALYZE] query

查看执行计划的关键词为：EXPLAIN
官方文档上也有详细的描述：
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain

我们来看一个简单的例子：

hive> explain select * from test.test8;
OK
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: test8
Statistics: Num rows: 1 Data size: 38 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: user_id (type: int), name (type: string), address (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 38 Basic stats: COMPLETE Column stats: NONE
ListSink
Time taken: 0.791 seconds, Fetched: 17 row(s)

执行计划第一部分：

STAGE DEPENDENCIES，各stage之间的依赖关系，就是执行的顺序

STAGE DEPENDENCIES:
Stage-0 is a root stage

执行计划第二部分：

STAGE PLANS，详细的执行计划

这里只有一步，没有依赖的操作，比较简单

STAGE PLANS:
Stage: Stage-0

抓取数据操作，没有limit限制


Fetch Operator
limit: -1

表扫描，表别名为test8


TableScan
alias: test8

需要查询的字段，user_id 、name 、address以及字段类型


Select Operator
expressions: user_id (type: int), name (type: string), address (type: string)

输出字段

outputColumnNames: _col0, _col1, _col2

如果是MR任务的话，
分为 Map Operator Tree和Reduce Operator Tree，如果中间有过滤操作的话，Filter Operator，聚合操作Group By Operator，join操作Join Operator等等。

下面我们来看几个复杂一点的例子：

聚合操作：

hive> explain select
>
apptypeid,
>
uid,
>
srcqid,
>
os,
>
isnewuser,
>
SUM(pv) AS pv,
>
dt
> FROM dw_center.dwb_open_srcqid_os_ver_user
> where dt='20210727'
> group by apptypeid,uid,srcqid,os,isnewuser,dt;
OK
STAGE DEPENDENCIES:--stage之间的依赖关系，有两个stage，下面依赖上面
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:--执行计划
Stage: Stage-1
Map Reduce--MR任务
Map Operator Tree:--map阶段
TableScan--表扫描
alias: dwb_open_srcqid_os_ver_user--表名
Statistics: Num rows: 220982 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE--表信息
Select Operator--需要查询的字段以及字段结构
expressions: apptypeid (type: string), uid (type: string), srcqid (type: string), os (type: string), isnewuser (type: int), pv (type: bigint)
outputColumnNames: apptypeid, uid, srcqid, os, isnewuser, pv--输出的字段
Statistics: Num rows: 220982 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
Group By Operator--group by操作
aggregations: sum(pv)--聚合函数sum,聚合字段pv
keys: apptypeid (type: string), uid (type: string), srcqid (type: string), os (type: string), isnewuser (type: int)--聚合的key
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5--map阶段输出的字段
Statistics: Num rows: 220982 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator--reduce阶段输出信息
key expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int)
sort order: +++++--按5个字段正序排序
Map-reduce partition columns: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int)
Statistics: Num rows: 220982 Data size: 91044719 Basic stats: COMPLETE Column stats: NONE
value expressions: _col5 (type: bigint)--聚合后的值
Reduce Operator Tree:--reduce阶段
Group By Operator--group by聚合操作
aggregations: sum(VALUE._col0)--聚合函数
keys: KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string), KEY._col3 (type: string), KEY._col4 (type: int)--聚合的key
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
Statistics: Num rows: 110491 Data size: 45522359 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col3 (type: string), _col4 (type: int), _col5 (type: bigint), '20210727' (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
Statistics: Num rows: 110491 Data size: 45522359 Basic stats: COMPLETE Column stats: NONE
File Output Operator--文件输出
compressed: false--是否压缩
Statistics: Num rows: 110491 Data size: 45522359 Basic stats: COMPLETE Column stats: NONE
table:输出数据的格式
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator--抓取数据操作
limit: -1--不做limit限制
Processor Tree:
ListSink
Time taken: 0.471 seconds, Fetched: 52 row(s)

join操作：

hive> explain select
>
t1.apptypeid,
>
t1.uid,
>
t1.srcqid,
>
t2.os,
>
t1.dt
> from
> (select
>
apptypeid,
>
uid,
>
srcqid,
>
dt
> from dw_center.dwb_open_srcqid_user
> where dt='20210727') t1
>
> left join
> (select
>
apptypeid,
>
uid,
>
os,
>
dt
> from dw_center.dwb_open_os_user
> where dt='20210727') t2
> on t1.apptypeid=t2.apptypeid and t1.uid=t2.uid and t1.dt=t2.dt;
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:--map阶段
TableScan
alias: dwb_open_srcqid_user--表别名
Statistics: Num rows: 289319 Data size: 86795789 Basic stats: COMPLETE Column stats: NONE
Select Operator--查询的字段以及字段结构
expressions: apptypeid (type: string), uid (type: string), srcqid (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 289319 Data size: 86795789 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string)--关联的字段
sort order: ++
Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
Statistics: Num rows: 289319 Data size: 86795789 Basic stats: COMPLETE Column stats: NONE
value expressions: _col2 (type: string)
TableScan
alias: dwb_open_os_user--表别名
Statistics: Num rows: 259103 Data size: 77731098 Basic stats: COMPLETE Column stats: NONE
Select Operator--查询的字段以及字段结构
expressions: apptypeid (type: string), uid (type: string), os (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 259103 Data size: 77731098 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string)--关联的字段
sort order: ++
Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
Statistics: Num rows: 259103 Data size: 77731098 Basic stats: COMPLETE Column stats: NONE
value expressions: _col2 (type: string)
Reduce Operator Tree:--reduce阶段
Join Operator--join操作
condition map:
Left Outer Join0 to 1 --left join操作
keys:--关联的key，dt为虚拟字段，没有参与关联
0 _col0 (type: string), _col1 (type: string)
1 _col0 (type: string), _col1 (type: string)
outputColumnNames: _col0, _col1, _col2, _col5--输出字段
Statistics: Num rows: 318250 Data size: 95475369 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string), _col1 (type: string), _col2 (type: string), _col5 (type: string), '20210727' (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 318250 Data size: 95475369 Basic stats: COMPLETE Column stats: NONE
File Output Operator--文件输出
compressed: false
Statistics: Num rows: 318250 Data size: 95475369 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
Time taken: 2.032 seconds, Fetched: 61 row(s)

最后

以上就是粗犷抽屉最近收集整理的关于hive执行计划分析（join详解）的全部内容，更多相关hive执行计划分析（join详解）内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：hive
浏览次数：206 次浏览
发布日期：2023-10-04 02:00:52
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzogf2_14_j_10_z.html

hive执行计划分析（join详解）

语法：

我们来看一个简单的例子：

执行计划第一部分：

STAGE DEPENDENCIES，各stage之间的依赖关系，就是执行的顺序

执行计划第二部分：

STAGE PLANS，详细的执行计划

下面我们来看几个复杂一点的例子：

聚合操作：

join操作：

最后

评论列表共有 0 条评论

发表评论取消回复

hive执行计划分析（join详解）

语法：

我们来看一个简单的例子：

执行计划第一部分：

STAGE DEPENDENCIES，各stage之间的依赖关系，就是执行的顺序

执行计划第二部分：

STAGE PLANS，详细的执行计划

下面我们来看几个复杂一点的例子：

聚合操作：

join操作：

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复