Hadoop之hive中sql常用函数汇总

101 阅读 0 评论 67 点赞

我是靠谱客的博主友好冰淇淋，最近开发中收集的这篇文章主要介绍Hadoop之hive中sql常用函数汇总，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

1、hive执行引擎 mr/tez/spark

set hive.execution.engine = mr;

2、开启动态分区

set hive.exec.dynamic.partition = true;
set hive.exec.dynamic.partition.mode = nonstrict;
## 删除分区：
ALTER TABLE dm.user_action_self_help_w_wi DROP IF EXISTS PARTITION (dt='2019-08-15',pd=2);

3、with 连接词

with TABLE_NAME AS (
SELECT ... FROM ... WHERE ...
)
-- 首个连接需要with，后续不要with：
TABLE_NAME AS (
SELECT ... FROM ... WHERE ...
)

4、为字段重命名

old_name as new_name
-- 或(不加as)：
old_name new_name

5、row_number() over(partition by A order by B asc/desc)

row_number() over(partition by A,B,C order by D asc/desc)
-- 将查询结果按照A,B,C字段分组（partition），
-- 然后组内按照D字段排序，至于asc还是desc，可自行选择，
-- 然后为每行记录返回一个row_number用于标记顺序(编号)
特色功能1：给 已有hive表(dm.official_accounts_funscount_w) 添加一列序号(sample_key)，例：
select
row_number() over(
partition by case when t.source is not null then 1 end
order by t.source asc,t.funCounts desc
) as sample_key,
t.source,
t.cityName,
t.weight,
t.strArea,
t.end_date,
t.funCounts
from dm.official_accounts_funscount_w t;
特色功能2：给表（多个字段）中某个字段去重，例：
-- 临时表2：去重数据
drop table if exists dm.table_info__02;
create table dm.table_info_02 stored as parquet as
select
*
from
(
select
*,
row_number() over(partition by id order by time desc) as rn
from dm.table_info_01
) a
where a.rn = 1;

6、concat(a,b,c...)

CONCAT(str1,str2,…)
-- 返回结果为连接参数产生的字符串。
-- 注意：如有任何一个参数为NULL ，则返回值为 NULL。
示例：concat('hello_','world')
-- 将a字符串与b字符串拼接 ==>('hello_world')

7、substr()与substring()异同

【语法】：
substring(字段名,startIndex,length) 或 substring(字段名,startIndex);
substr(字段名,startIndex,length) 或 substr(字段名,startIndex);
【示例】：
uid = '15895583937877696'
substring(uid,11,7) = '7877696'
substr(uid,11,7) = '7877696'
-- 对于字符串'15895583937877696'，从索引值为11开始，截取长度为7的字串 ==》（'7877696'）
substring(uid,11) = '7877696'
substr(uid,11) = '7877696'
-- 对于字符串'15895583937877696'，从索引值为11开始，一直到结束 ==》（'7877696'）

8、overwrite与into

insert overwrite table TABLE_NAME;
-- 表示：删除原有数据然后在新增数据，如果有分区那么只会删除指定分区数据，其他分区数据不受影响。
insert into table TABLE_NAME;
-- 表示：在原有数据的基础上增加数据

9、${T_?D} / ${T?D}

-- ${}：EL表达式
-- T_1D：当前时间的前一天，且时间中的分隔符为“-”。如（2019-05-12）
-- T1D：当前时间的前一天，且时间中无分隔符。如（20190512）
## 天：
dt = '${T1D}'
concat(year,month,day) = '${T1D}'
## 周：
dt between from_unixtime(unix_timestamp('${T1D}', 'yyyyMMdd') - 6*24*60*60, 'yyyyMMdd') and '${T1D}'
## 月：
dt like concat(substr('${T1D}', 1, 6), '%')

10、sum(1) 和 count(1)

sum(1) / sum(true)
-- 表示：统计所有不为NULL值的行数（不包括NULL值），即为 pv 值。
count(1)
/ count(true)
-- 表示：统计所有行的行数（包括NULL值）
count(null) 结果恒为0

11、cast(A as B)

cast(A as B)
-- 表示：将A格式化为B
例：
cast(info_time as decimal(38,2))
-- 将（时间/金钱）字段数据保留两位小数

12、decimal(38,2)

decimal(38,2)
-- 表示：计算结果保留有效位38位，小数位2位

13、timestamp 时间戳操作

1、current_date
-- 表示：当前日期,yyyy-MM-dd
2、current_time
-- 表示：当前时间,HH:mm:ss
3、current_timestamp
-- 表示：返回当前UTC时间(GMT+0)的时间戳，小于北京时间8小时，就是日期时间yyyy-MM-dd HH:mm:ss
4、unix_timestamp()
4.1、unix_timestamp()
-- 得到当前时间戳.
4.2、unix_timestamp(string date)
-- 如果参数date满足yyyy-MM-dd HH:mm:ss形式，则可以直接 得到参数对应的时间戳.
-- 如果参数date不满足yyyy-MM-dd HH:mm:ss形式，则我们需要指定date的形式，再进行转换
如：unix_timestamp('2009-03-20', 'yyyy-MM-dd')=1237532400
5、from_unixtime(unix_timestamp,format)
-- 表示：返回表示 Unix 时间标记的一个字符串，根据format字符串格式化。
语法：from_unixtime(t1,'yyyy-MM-dd HH:mm:ss')
单独取小时：from_unixtime(t1,'HH')
其中t1是10位的时间戳值，即1970-1-1至今的秒，而13位的所谓毫秒的是不可以的。
对于13位时间戳，需要截取，然后转换成bigint类型，因为from_unixtime类第一个参数只接受bigint类型。 例如：
from_unixtime(cast(substring(tistmp,1,10) as bigint),'yyyy-MM-dd HH:mm:ss')
from_unixtime(cast(substring(inserttime,1,10) as bigint),'yyyy-MM-dd')
6、postgresql数据库中：
select to_char(to_timestamp(1615542689000/1000),'yyyy-MM-dd HH24:MI:SS') as time;

14、if("表达式",true,null)

if("表达式",true,null)
-- 如果表达式成立，取参数true，否则取参数null

15、INVALIDATE METADATA

1、INVALIDATE METADATA;
-- 刷新 impala 中所有表数据
2、INVALIDATE METADATA TABLE_NAME;
-- 刷新 impala 中指定表数据

16、union all

insert overwrite table TABLE_NAME
select X from ...
union all
select X from ...;
-- 表示：将多个查询的结果合并，表中的数据都罗列出来（包括重复的）
-- 注："X"的位置必须相同

17、case when then else end

--1、简单Case函数（简便）
CASE sex
WHEN '1' THEN '男'
WHEN '2' THEN '女'
ELSE '其他' END;
-- 表示：若sex='1'，返回'男'；sex='2'，返回'女';否则返回'其他'。
--2、Case搜索函数（功能更强）
CASE WHEN sex = '1' THEN '男'
WHEN sex = '2' THEN '女'
ELSE '其他' END;
-- 表示：若sex='1'，返回'男'；sex='2'，返回'女';否则返回'其他'。

18、NVL、NVL2、NULLIF

NVL(参数1,参数2)
-- 表示：如果第一个参数为空则显示第二个参数，反之则显示第一个参数；
-- 注：NVL(expr1, expr2)函数中expr1和expr2的类型要一样
--
NVL(expr1, expr2)函数中当expr1和expr2的类型不一样时expr2的类型要能隐式转换成expr1的类型
-- 常用于【非空判断】：如：nvl(nmd.distinct_id,'') <> ''
NVL2(参数1,参数2,参数3)
-- 表示：如果第一个参数为空则显示第二个参数，反之显示第三个参数；
NULLIF(参数1,参数2)
-- 表示：如果参数1和参数2相等则返回空(NULL)，否则返回第一个参数；

19、Coalesce(expr1, expr2, expr3….. exprn)

Coalesce(expr1, expr2, expr3….. exprn)
-- 表示：返回第一个非空参数(所有参数必须是相同类型，或可隐性转换为相同的类型)。

20、oracle截取时间函数TRUNC(SYSDATE)

默认是截取系统日期到日，得到数据格式为："2012-12-19"==> 等同于：
from_unixtime(unix_timestamp(current_timestamp), 'yyyy-MM-dd')

21、hive中的正则表达式使用

【正则函数】：
1、字段 regexp 正则表达式
==> true / false
2、regexp_extract(字段，正则表达式分组，index)
==> 匹配内容字符
---------------------
【示例】：
btype
$(expr 928597146 + 949970217)
<%- 924418698+959774998 %>
|expr 975562563 + 896732210
&set /A 965418523+945373284
/*1*/{{826602513+801185455}}
#set($c=985202044+978568225)${c}$c
'and(select'1'from/**/cast(md5(1344042140)as/**/int))>'0
'and'r'='r
convert(int,sys.fn_sqlvarbasetostr(HashBytes('MD5','1847566026')))
"and"v"="v
extractvalue(1,concat(char(126),md5(1806726239)))
'and(select*from(select+sleep(0))a/**/union/**/select+1)='
/**/and/**/cast(md5('1758968109')as/**/int)>0
'and'p'='l
"and/**/extractvalue(1,concat(char(126),md5(1386445479)))and"
é'"(
"and(select*from(select+sleep(0))a/**/union/**/select+1)="
'/**/and(select'1'from/**/pg_sleep(0))>'0
'and/**/extractvalue(1,concat(char(126),md5(1588692251)))and'
------------------
-- 找出要过滤掉的值中的标志性字符：
$ + < % # @ {} () : ' ? /  . * =
-- 编写正则表达式(特殊字符要转置)：
"\$|\+|<|%|#|@|\{|\(|:|'|\?|/|\|\.|\*|="
-- 添加hive中的特殊分隔符和特殊字符0过滤：
"\$|\+|<|%|#|@|{|(|:|'|\?|/|\|\.|\*|=|\|\01"

最后

以上就是友好冰淇淋为你收集整理的Hadoop之hive中sql常用函数汇总的全部内容，希望文章能够帮你解决Hadoop之hive中sql常用函数汇总所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：数据仓库（hive）
浏览次数：101 次浏览
发布日期：2023-09-22 15:55:54
本文链接：https://www.kaopuke.com/article/k-p-k_14_uzo_2_f5_12__23__18_y.html

Hadoop之hive中sql常用函数汇总

概述

1、hive执行引擎 mr/tez/spark

2、开启动态分区

3、with 连接词

4、为字段重命名

5、row_number() over(partition by A order by B asc/desc)

6、concat(a,b,c...)

7、substr()与substring()异同

8、overwrite与into

9、${T_?D} / ${T?D}

10、sum(1) 和 count(1)

11、cast(A as B)

12、decimal(38,2)

13、timestamp 时间戳操作

14、if("表达式",true,null)

15、INVALIDATE METADATA

16、union all

17、case when then else end

18、NVL、NVL2、NULLIF

19、Coalesce(expr1, expr2, expr3….. exprn)

20、oracle截取时间函数TRUNC(SYSDATE)

21、hive中的正则表达式使用

最后

评论列表共有 0 条评论

发表评论取消回复

Hadoop之hive中sql常用函数汇总

概述

1、hive执行引擎 mr/tez/spark

2、开启动态分区

3、with 连接词

4、为字段重命名

5、row_number() over(partition by A order by B asc/desc)

6、concat(a,b,c...)

7、substr()与substring()异同

8、overwrite与into

9、${T_?D} / ${T?D}

10、sum(1) 和 count(1)

11、cast(A as B)

12、decimal(38,2)

13、timestamp 时间戳操作

14、if("表达式",true,null)

15、INVALIDATE METADATA

16、union all

17、case when then else end

18、NVL、NVL2、NULLIF

19、Coalesce(expr1, expr2, expr3….. exprn)

20、oracle截取时间函数TRUNC(SYSDATE)

21、hive中的正则表达式使用

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复