Hive查询实例

201 阅读 0 评论 133 点赞

我是靠谱客的博主快乐毛巾，这篇文章主要介绍Hive查询实例，现在分享给大家，希望可以做个参考。

数据查询：
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
  FROM table_reference
  [WHERE where_condition]
  [GROUP BY col_list]
  [ORDER BY col_list]
  [CLUSTER BY col_list
    | [DISTRIBUTE BY col_list] [SORT BY col_list]
  ]
 [LIMIT [offset,] rows]

全表查询：
select * from emp;

指定字段查询：
select t.empno,t.ename,t.job from emp t;

限制查询
select * from emp limit 5;

= >= <= between and

select t.empno,t.ename,t.job from emp t where t.sal between 800 and 1500;

is null/is not null /in /not in
select t.empno,t.ename,t.job from emp t where t.comm is null;

max/min/count/sum/avg
select count(*) cnt from emp;
select max(sal) max_sal from emp;
select sum(sal) sum_sal from emp;
select avg(sal) avg_sal from emp;


group by /having

每个部门的平均工资
select t.deptno,avg(sal) avg_sal from emp t group by t.deptno;

每个部门中最高岗位的薪水
select t.deptno,t.job,max(t.sal) max_sal from emp t group by t.deptno,t.job;

having

where是针对单条记录进行筛选过滤的
having是对分组结果进行筛选过滤的

求每个部门的平均薪水大于2000
select t.deptno,avg(t.sal) avg_sal from emp t group by t.deptno having avg_sal > 2000;

join
两个表进行连接，m表中的记录和n表的中记录组成一条记录

等值join：join ... on ...
select e.empno,e.ename,e.deptno from emp e join dept d on e.deptno = d.deptno ;

右连接：right join(以右表为准)
select e.empno,e.ename,e.deptno from emp e right join dept d on e.deptno = d.deptno ;



左连接：left join(以左表为准)
select e.empno,e.ename,e.deptno from emp e left join dept d on e.deptno = d.deptno ;

全连接
select e.empno,e.ename,e.deptno from emp e full join dept d on e.deptno = d.deptno ;

order by：对全局数据进行排序，仅仅只有一个reduce
select * from emp order by empno desc;

sort by:对每一个reduce内部数据进行排序，对全局的结果集来说没有排序
设置mapreduce个数：
set mapreduce.job.reduces= 3；
查看设置：
set mapreduce.job.reduces
insert overwrite local directory '/home/wql/app/hData/sortby-res' select * from emp sort by empno asc;

distribute by:类似于MapReduce中分区partition,对数据进行分区，结合sort by进行使用
insert overwrite local directory '/home/wql/app/hData/distby-res' select * from emp distribute by deptno sort by empno asc;

cluster by当distribute by和sort by字段相同时，就可以使用cluster by；
insert overwrite local directory '/home/wql/app/hData/clusby-res' select * from emp cluster by empno;


查看所有函数：
show functions;
desc function split;
desc function extended  split;