大数据技术之数仓--DW--Hadoop数仓实践Case-17-累积度量

295 阅读 0 评论 195 点赞

我是靠谱客的博主娇气电话，这篇文章主要介绍大数据技术之数仓--DW--Hadoop数仓实践Case-17-累积度量，现在分享给大家，希望可以做个参考。

累积度量属于半可加事实，使用的时候需要小心一些！

累计度量概述

累积度量指的是聚合从序列内第一个元素到当前元素的数据，例如统计从每年的一月到当前月份的累积销售额。本文说明如何在销售订单示例中实现累积月销售数量和金额，并对数据仓库模式、初始装载、定期装载脚本做相应的修改。累积度量是半可加的，而且它的初始装载比前面实现的要复杂。

数仓模型设计

建立一个新的名为month_end_balance_fact的事实表，用来存储销售订单金额和数量的月累积值。 month_end_balance_fact表在模式中构成了另一个星型模式。新的星型模式除了包括这个新的事实表，还包括两个其他星型模式中已有的维度表，即产品维度表与月份维度表。下图显示了新的模式。注意这里只显示了相关的表。
【大数据开发学习资料领取方式】：加入大数据技术学习交流群458345782，点击加入群聊，私信管理员即可免费领取

累计度量数仓架构.PNG

创建表

下面的脚本用于创建month_end_balance_fact表。

use dw;
create table
    month_end_balance_fact (
    month_sk int,
    product_sk int,
    month_end_amount_balance decimal(10,2),
    month_end_quantity_balance int
);

初始装载

现在要把month_end_sales_order_fact表里的数据装载进month_end_balance_fact表，下面显示了初始装载month_end_balance_fact表的脚本。此脚本装载累积的月销售订单汇总数据，从每年的一月累积到当月，累积数据不跨年。

-- 初始化加载
use dw;
insert overwrite table
    month_end_balance_fact
select
a.month_sk,
b.product_sk,
sum(b.month_order_amount) month_order_amount,
sum(b.month_order_quantity) month_order_quantity
from
    month_dim a,
(select 
    a.*,
    b.year,
    b.month,
    max(a.order_month_sk) over () max_month_sk
from
    month_end_sales_order_fact a, month_dim b
where
    a.order_month_sk = b.month_sk) b
where
    a.month_sk <= b.max_month_sk
and
    a.year= b.year 
and b.month<= a.month
group by
    a.month_sk , b.product_sk;

语句说明：子查询获取month_end_sales_order_fact表的数据，及其年月和最大月份代理键。外层查询汇总每年一月到当月的累积销售数据， a.month_sk <= b.max_month_sk条件用于限定只统计到现存的最大月份为止。

定期装载

定期装载销售订单累积度量，每个月执行一次，装载上个月的数据。可以在执行完月周期快照表定期装载后执行该脚本。

-- 设置变量以支持事务
set hive.support.concurrency=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.dbtxnmanager;
set hive.compactor.initiator.on=true;
set hive.compactor.worker.threads=1;
use dw;
set hivevar:pre_month_date = add_months(current_date,-1);
set hivevar:year= year(${hivevar:pre_month_date});
set hivevar:month= month(${hivevar:pre_month_date});
insert into
    month_end_balance_fact
select
    order_month_sk,
    product_sk,
sum(month_order_amount),
sum(month_order_quantity)
from
(select
a.*
from
month_end_sales_order_fact a,
month_dim b
where
a.order_month_sk = b.month_sk
and
b.year= ${hivevar:year}
and
b.month= ${hivevar:month}
union all
select
month_sk + 1 order_month_sk,
product_sk product_sk,
month_end_amount_balance month_order_amount,
month_end_quantity_balance month_order_quantity
from
month_end_balance_fact a
wherea.month_sk in
(select max(case when${hivevar:month} = 1 then0 elsemonth_sk end
)
from
month_end_balance_fact)) t
group by
order_month_sk, product_sk
;