关于jps进程中的runjar解决办法
kill -9 runjar进程号
– win电脑不要在云实验平台中使用ctrl+c 复制内容
su - root # 切换到root用户,并且使用root用户对应的环境变量
– 注意在无论进行何种表链接时,一定要将各种表进行重命名,防止错误发生
一 内连接
join 和inner join 是相同的
1.找出既在user_list_1也在user_list_2的用户:
方法一:
1
2
3
4select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id;
方法二:
1
2
3
4
5select * from user_list_1 a inner join user_list_2 b on a.user_id=b.user_id limit 10;
需求1:找出在2019年购买后又退款的用户
1
2
3
4
5
6
7
8
9
10
11select a.user_name from (select distinct user_name from user_trade where year(dt)=2019)a join (select distinct user_name from user_refund where year(dt)=2019)b on a.user_name=b.user_name;
需求2:在2017年和2018年都购买的用户
1
2
3
4
5
6
7
8
9
10
11
12select a.user_name from (select distinct user_name from user_trade where year(dt)=2017)a join ( select distinct user_name from user_trade where year(dt)=2018)b on a.user_name=b.user_name;
需求3:在2017年、2018年、2019都有交易的用户
方法1:推荐使用
1
2
3
4
5
6
7
8
9
10
11
12
13select distinct a.user_name from ((select distinct user_name from trade_2017)a join (select distinct user_name from trade_2018)b on a.user_name=b.user_name join (select distinct user_name from trade_2019)c on b.user_name=c.user_name);
方法2:前提是三个表数据量都特别少
1
2
3
4
5
6
7select distinct a.user_name from trade_2017 a join trade_2018 b on a.user_name=b.user_name join trade_2019 c on b.user_name=c.user_name;
总结:内连接就是将多个表合并,以一定的字段为依据取出共有的行
二.左连接
对表1和表2进行左连接
1
2
3
4
5select * from user_list_1 a left join user_list_2 b on a.user_id=b.user_id;
如何取出,在user_list_1表中但是不在user_list_2的用户?
1
2
3
4
5
6select a.user_id,a.user_name from user_list_1 a left join user_list_2 b on a.user_id=b.user_id where b.user_id is null;
– 注意: 此时查询时,一定要查a表的 此时差b表结果显示空
需求4:在2019年购买,但是没有退款的用户
1
2
3
4
5
6
7
8
9
10
11
12select a.user_name from ((select distinct user_name from user_trade where year(dt)=2019)a left join (select distinct user_name from user_refund where year(dt)=2019)b on a.user_name=b.user_name) where b.user_name is null;
需求5:在2019年有购买的用户的学历分布
1
2
3
4
5
6
7
8
9
10
11select b.edu,count(a.user_name) from ((select distinct user_name from user_trade where year(dt)=2019)a left join (select distinct user_name,get_json_object(extra1,'$.education')as edu from user_info)b on a.user_name=b.user_name) group by b.edu;
需求6:在2017和2018年都购买,但是没有在2019年购买的用户
1
2
3
4
5
6
7
8
9
10
11
12
13
14select a.user_name from (select distinct user_name from trade_2017)a left join (select distinct user_name from trade_2018)b on a.user_name=b.user_name left join (select distinct user_name from trade_2019)c on b.user_name=c.user_name where c.user_name is null;
总结:left join 为左连接,以左边的表的某一个字段为依据,将多个表
进行拼接,若第二个或第三个表中的数据在最左边的数据没有,则显示空值.
若要取出空值,查询时用左表查询,条件为空值的表某字段为空值.
三 全连接 full join
对表1和表2进行全连接
1
2
3
4
5select * from user_list_1 a full join user_list_2 b on a.user_id=b.user_id;
注意:全连接是纵向连接,和左连接的连接方法一样,没有的显示空值
四.union all/union
将user_list_1和user_list_3合并在一起:
1
2
3
4
5
6select user_id,user_name from user_list_1 union all select user_id,user_name from user_list_2;
注意:全连接是横向连接,将所有字段都拼接在一块,
字段名称必须一致!
字段顺序必须一致!
没有连接条件!
需求7:2017-2019年有交易的所有用户数
方法3: 推荐写法
1
2
3
4
5
6
7
8
9
10
11select count(distinct a.user_name) from (select distinct user_name from trade_2017 union all select distinct user_name from trade_2018 union all select distinct user_name from trade_2019)a;
需求7:2017-2019年有交易的所有用户数
方法一:
1
2
3
4
5
6
7
8
9
10
11
12select count(distinct a.user_name), count(a.user_name) from (select user_name from trade_2017 union all select user_name from trade_2018 union all select user_name from trade_2019)a;
方法2:
1
2
3
4
5
6
7
8
9
10
11
12select count(distinct a.user_name), count(a.user_name) from (select user_name from trade_2017 union select user_name from trade_2018 union select user_name from trade_2019)a;
需求8:2019年每个用户的支付和退款金额汇总
1. 从两个表中分别筛选出2019年的每个用户的支付和退款总金额
2. 汇总
1
2
3
4
5
6
7
8
9
10
11
12
13select a.user_name,sum(total_amount),sum(total_refund) from (select user_name,sum(pay_amount) total_amount,0 as total_refund from user_trade where year(dt)=2019 group by user_name union all select user_name,0 as total_amount,sum(refund_amount) total_refund from user_refund where year(dt)=2019 group by user_name)a group by a.user_name;
需求9:2019年每个支付用户的支付金额和退款金额
1. 把2019年每个支付用户求出来, 并且拿出来总的支付金额
2. 得出2019年每个退款用户, 总退款金额拿出来
3, 左连接获取到以支付金额为主表的需求结果
1
2
3
4
5
6
7
8
9
10
11
12
13select a.user_name,a.total_amount,b.total_refund from (select user_name,sum(pay_amount) total_amount from user_trade where year(dt)=2019 group by user_name)a left join (select user_name,sum(refund_amount) total_refund from user_refund where year(dt)=2019 group by user_name)b on a.user_name=b.user_name;
– 解决b.refund_amount的null值
1
2
3
4
5
6
7
8
9
10
11
12
13
14select a.user_name,a.total_amount, if(b.refund_amount is null, 0, b.refund_amount) from (select user_name, sum(pay_amount) total_amount from user_trade where year(dt)=2019 group by user_name)a left join (select user_name,sum(refund_amount) refund_amount from user_refund where year(dt)=2019 group by user_name)b on a.user_name=b.user_name;
多表连接容易犯的错误
1
2
3select * FROM a join b join c on a.xx=b.xx=c.xx;
需求10:首次激活时间在2017年,但是一直没有支付的用户年龄段分布
/*
1.限制时间在2017年的用户并且把年龄段筛选出来
2.没有支付的人选出来
3.年龄段分布
*/
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18select a.age_range,count(a.user_name) from (select user_name, case when age<20 then '20岁以下' when age>=20 and age<30 then '20-30岁' when age>=30 and age<40 then '30-40岁' else '40岁以上' end age_range from user_info where year(firstactivetime)=2017)a left join (select distinct user_name from user_trade where dt>'0')b on a.user_name=b.user_name where b.user_name is null group by a.age_range;
注意:dt是个分区字段,使用where 加分区字段来构建分区表,分区的最终
目的是在查询时,使用分区列过滤!
需求11:2018 2019年交易的用户,其激活时间段分布
/*
1.去除2018 2019年有交易的用户的全集
2.取出所有用户的激活时间
3.统计时间段分布
*/
1
2
3
4
5
6
7
8
9
10
11
12select hour(firstactivetime),count(a.user_name) from (select user_name from trade_2018 union select user_name from trade_2019)a left join user_info b on a.user_name=b.user_name group by hour(firstactivetime);
注意:union 去重且排序
union all 不去重且不排序
最后
以上就是苗条煎饼最近收集整理的关于大数据hive之表连接关于jps进程中的runjar解决办法的全部内容,更多相关大数据hive之表连接关于jps进程中内容请搜索靠谱客的其他文章。
发表评论 取消回复