常用数据库查询语句hivesql

83 阅读 0 评论 55 点赞

我是靠谱客的博主活泼白云，最近开发中收集的这篇文章主要介绍常用数据库查询语句hivesql，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

字符串

(1)length(‘abcedfg’) 作用：返回字符串的长度

(2)reverse(‘abcedfg’) 作用：返回字符串的反转结果

(3)concat(‘abc’,'def’,'gh’) 作用：返回输入字符串连接后的结果，支持任意个输入字符串
concat_ws(‘,’,'abc’,'def’,‘gh’) 作用：返回输入字符串连接后的结果，’,'分隔符

(4)substr,substring(‘abcde’,3) 作用：返回字符串从3位置©到结尾的字符串
substr,substring(‘abcde’,3,2) 作用：返回字符串从3位置开始，长度为2的字符串

(5)upper,ucase(‘abSEd’) 作用：字符串转大写函数
lower,lcase(‘abSEd’) 作用：字符串转小写函数

(6)trim,ltrim,rtrim(‘ abc ‘) 作用：去除字符串两边的空格

(7)regexp_extract(‘foothebar’, ‘foo(.*?)(bar)’, 1) 作用：将字符串subject按照pattern正则表达式的规则拆分

(8)parse_url(‘http://facebook.com/path/p1.php?query=1’, ‘HOST’) 作用：解析URL字符串

(9)get_json_object(string json_string, string path) 作用：解析json函数

(10)space(int n) 作用：返回长度为n的字符串

(11)repeat(string str, int n) 作用：返回重复n次后的str字符串

(12)ascii(string str) 作用：返回字符串str第一个字符的ascii码

(13)lpad(string str, int len, string pad) 作用：将str进行用pad进行左补足到len位
rpad(string str, int len, string pad) 作用：将str进行用pad进行右补足到len位

(14)split(string str, string pat) 作用：按照pat字符串分割str，返回分割后的字符串数组

(15)find_in_set(string str, string strList) 作用：返回str在strlist第一次出现的位置

count

select
      pt_day,
      count(*),
      count(uid),count(identifier),
      count(distinct uid),count(distinct identifier),
      count(case when appkey='CSIos' then identifier else null end),count(case when appkey='CSAndroid' then identifier else null end),
      count(distinct case when appkey='CSIos' then identifier else null end),count(distinct case when appkey='CSAndroid' then identifier else null end),
      count(case when appkey in ('CSIos','CSAndroid') then identifier else null end),
      count(distinct case when appkey in ('CSIos','CSAndroid') then identifier else null end)
 from bi_all_access_log
where pt_day between '2017-11-01' and '2017-11-14'
group by pt_day
order by pt_day

1.count(*)、count(1)：
count(*)对行的数目进行计算，包含NULL，count(1)这个用法和count(*)的结
果是一样的。
如果表没有主键，那么count(1)比count(*)快。表有主键，count(*)会自动优化
到主键列上。
如果表只有一个字段，count(*)最快。
count(1)跟count(主键)一样，只扫描主键。count(*)跟count(非主键)一样，
扫描整个表。明显前者更快一些。
count(1)和count(*)基本没有差别，但在优化的时候尽量使用count(1)

2.count(1)、count（列名）：
（1） count(1) 会统计表中的所有的记录数，包含字段为null 的记录。
（2） count(字段) 会统计该字段在表中出现的次数，忽略字段为null 的情况。
即不统计字段为null 的记录。

日期格式设置

系统时间：from_unixtime(unix_timestamp() ,'yyyy-MM-dd HH:mm:ss') as ins_date
固定日期转换成时间戳
select unix_timestamp('2016-08-16','yyyy-MM-dd') --1471276800
select unix_timestamp('20160816','yyyyMMdd') --1471276800
select unix_timestamp('2016-08-16T10:02:41Z', "yyyy-MM-dd'T'HH:mm:ss'Z'") --1471312961
时间戳转换程固定日期
select from_unixtime(1471276800,'yyyy-MM-dd') --2016-08-16
select from_unixtime(1471276800,'yyyyMMdd') --20160816
select from_unixtime(1471312961) -- 2016-08-16 10:02:41
select from_unixtime( unix_timestamp('20160816','yyyyMMdd'),'yyyy-MM-dd') --2016-08-16
select date_format('2016-08-16','yyyyMMdd') --20160816
返回日期时间字段中的日期部分
select to_date('2016-08-16 10:03:01') --2016-08-16
返回日期中的年
select year('2016-08-16 10:03:01') --2016
返回日期中的月
select month('2016-08-16 10:03:01') --8
返回日期中的日
select day('2016-08-16 10:03:01') --16
返回日期中的时
select hour('2016-08-16 10:03:01') --10
返回日期中的分
select minute('2016-08-16 10:03:01') --3
返回日期中的秒
select second('2016-08-16 10:03:01') --1
返回日期在当前的周数
select weekofyear('2016-08-16 10:03:01') --33
返回结束日期减去开始日期的天数
select datediff('2016-08-16','2016-08-11')
返回开始日期startdate增加days天后的日期
select date_add('2016-08-16',10)
返回开始日期startdate减少days天后的日期
select date_sub('2016-08-16',10)
返回当月的第一天
select trunc('2016-08-16','MM') --2016-08-01
select trunc('2016-08-16','YEAR') --2016-01-01
返回当天记录
select * from table_1 where date_col>=date(now()) and date_col<DATE_ADD(date(now()),INTERVAL 1 DAY)

计算累加和

每天的累加和、以及累积和-50、累加和-100 （设此临时表为qq）

select 
name,
costdate,
--每天消费金额
num,
--累加和
sumnum,
sumnum-50 as num_50,
sumnum-100 as num_100
from
(
select name,costdate,num,sum(num) over(partition by name order by costdate) as sumnum
from aa
)aaa

NVL函数

NVL(E1, E2)的功能为： - 如果E1为NULL，则函数返回E2，否则返回E1本身。但此函数有一定局限，所以就有了NVL2函数
NVL2(E1, E2, E3)的功能为：如果E1为NULL，则函数返回E3，若E1不为null，则返回E2
（NVL2函数:Oracle/PLSQL中的一个函数,Oracle在NVL函数的功能上扩展，提供了NVL2函数）

时间戳函数

1.unix_timestamp

1）返回当前时间的时间戳

select unix_timestamp();

2）如果参数date满足yyyy-MM-dd HH:mm:ss形式，则可以直接unix_timestamp(string date) 得到参数对应的时间戳
或者满足yyyy-MM-dd形式

select unix_timestamp('2018-12-05 01:10:00','yyyy-MM-dd HH:mm:ss');

select unix_timestamp('2018-12-05','yyyy-MM-dd');

2.from_timestamp

把时间戳转换成时间格式
```
 from_unixtime（tt）
```
tt为10位数的时间戳
```
 select from_unixtime(1543943400); 
```
from_unixtime（tt,‘yyyy-MM-dd’）可以加上时间格式
```
 select from_unixtime(1543943400,'yyyy-MM-dd');
```

窗口函数

1.窗口函数有以下功能：

1）同时具有分组和排序的功能

2）不减少原表的行数

3）语法如下：

<窗口函数> over (partition by <用于分组的列名>
                order by <用于排序的列名>)

2.举例

假设现在要对成绩排序，每个班级分别降序：

 select *,
    rank() over (partition by 班级
                  order by 成绩 desc) as ranking
 from 班级表

在这里插入图片描述

3.其他专业窗口函数

专用窗口函数：rank, dense_rank, row_number的区别

 select *,
    rank() over (order by 成绩 desc) as ranking,
    dense_rank() over (order by 成绩 desc) as dese_rank,
    row_number() over (order by 成绩 desc) as row_num
 from 班级表

在这里插入图片描述

如上图：对同分项处理方式不同；排名/不排名，占位/不占位

hive数据去重，并根据需求取其中一条

数据案例：

 name  adx       tran_id                 cost        ts       
 ck        5        125.168.10.0          33.00   1407234660
 ck        5        187.18.99.00          33.32   1407234661
 ck        5        125.168.10.0          33.24   1407234661

只需要前两行的记录，因为第三行的tran_id和第一行的重复了，所以需要将最后面一行重复的去掉。

方案一：

 select 
 t1.tran_id,t2.name,t2.cost
 from 
 (selectdistinct tran_id from table) t1
 join table t2 
 on t1.tran_id=t2.tran_id

分析：
如果使用distinct的话，需要把tran_id放在第一列，查出来的数据很不友好。

方案二：

 select*
  from(
         select *,
         row_number() 
         over (partition by tran_id order by timestamp desc) num 
         from table
         ) t
 where	t.num=1;

分析：
row_number() over (partition by tran_id order by timestamp desc) num 取num=1的意思是先根据tran_id进行分组，并在分组内部按timestamp降序排序，row_number()函数计算的值就表示某个tran_id组内部排序后的顺序编号（该编号在一个组内是连续并且唯一的)
所以最后直接去每个分组内的第一个（num=1）即可

笛卡尔积关联（cross join）：

返回两个表的笛卡尔积结果，不需要指定关联键

 select a.id,a.name,b.age 
 from 
 rdb_a a 
 cross join 
 rdb_b b;
  
 Total MapReduce CPU Time Spent: 1 seconds 260 msec
 OK
 1       lucy    12
 1       lucy    22
 1       lucy    32
 2       jack    12
 2       jack    22
 2       jack    32
 3       tony    12
 3       tony    22
 3       tony    32
 Time taken: 24.727 seconds, Fetched: 9 row(s)

最后

以上就是活泼白云为你收集整理的常用数据库查询语句hivesql的全部内容，希望文章能够帮你解决常用数据库查询语句hivesql所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：0020-hive/sql
浏览次数：83 次浏览
发布日期：2023-12-05 04:40:04
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_6_fy_14__23_k0.html

常用数据库查询语句hivesql

概述

字符串

count

日期格式设置

计算累加和

NVL函数

时间戳函数

1.unix_timestamp

2.from_timestamp

窗口函数

1.窗口函数有以下功能：

2.举例

3.其他专业窗口函数

hive数据去重，并根据需求取其中一条

笛卡尔积关联（cross join）：

最后

评论列表共有 0 条评论

发表评论取消回复

常用数据库查询语句hivesql

概述

字符串

count

日期格式设置

计算累加和

NVL函数

时间戳函数

1.unix_timestamp

2.from_timestamp

窗口函数

1.窗口函数有以下功能：

2.举例

3.其他专业窗口函数

hive数据去重，并根据需求取其中一条

笛卡尔积关联（cross join）：

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复