Hadoop基础教程-第11章 Hive：SQL on Hadoop（11.3 Hive 快速入门）

92 阅读 0 评论 61 点赞

我是靠谱客的博主淡定乌冬面，这篇文章主要介绍Hadoop基础教程-第11章 Hive：SQL on Hadoop（11.3 Hive 快速入门），现在分享给大家，希望可以做个参考。

第11章 Hive：SQL on Hadoop

11.3 Hive快速入门

11.3.1 HQL介绍

Hive查询语言（Hive QL，可以简称HQL）的语法和SQL非常相似，实际上HQL基本上实现了SQL-92标准，并做了一些扩展。但是仍然存在一些差异，比如不支持行级操作，不支持事务处理。HQL更接近MySQL的SQL方言，对于熟悉SQL语言的开发者而言，HQL很容易上手。

11.3.2 进入Hive Shell

Hive Shell即是Hive的CLI交互模式，输入hive回车即可进入。

[root@node3 ~]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

输入“quit;”即可退出HiveShell

hive> quit;
[root@node3 ~]#

11.3.3 Hive数据库

（1）显示数据库实例
Hive数据库本质上就是表的目录，或者说命名空间。如果创建数据表没有指定数据库，则使用默认数据库default。与MySQL一样，可以通过show databases命令查看已经存在的数据库。

hive> show databases;
OK
default
Time taken: 1.451 seconds, Fetched: 1 row(s)
hive>

（2）创建数据库
使用create database语句创建数据库。

hive> create database test;
OK
Time taken: 0.414 seconds
hive> show databases;
OK
default
test
Time taken: 0.021 seconds, Fetched: 2 row(s)
hive>

（3）查看数据库结构
describe 命令(可以简写desc)来查看数据库定义，包括：数据库名称、数据库在 HDFS 目录、HDFS

hive> desc database test;
OK
test        hdfs://cetc/user/hive/warehouse/test.db root    USER    
Time taken: 0.223 seconds, Fetched: 1 row(s)
hive>

Hive会为每个创建的数据库在HDFS上创建一个目录，该数据库的表会以子目录形式存储，表中的数据会以表目录下的文件形式存储。对于default数据库，默认的缺省数据库没有自己的目录，default数据库下的表默认存放在/user/hive/warehouse目录下。

（4）切换当前数据库

hive> use test;
OK
Time taken: 1.217 seconds
hive>

11.3.4 创建表

在默认default数据库下创建表users

hive> create table users(id int,name char(20));
OK
Time taken: 0.481 seconds
hive>

在test数据库下创建表tb1

hive> create table test.tb1(
    > id int,
    > name string,
    > price double
    > );
OK
Time taken: 2.044 seconds
hive>

11.3.4 查看表

（1）显示default数据库下的所有表

hive> show tables;
OK
users
Time taken: 0.231 seconds, Fetched: 1 row(s)
hive>

（2）查看表结构

hive> desc users;
OK
id                      int                                         
name                    char(20)                                    
Time taken: 0.171 seconds, Fetched: 2 row(s)
hive>

（3）查看test数据库下的数据表

hive> use test;
OK
Time taken: 0.037 seconds
hive> show tables;
OK
tb1
Time taken: 0.036 seconds, Fetched: 1 row(s)
hive>

也可以通过下面命令实现上面功能

hive> show tables in test;
OK
tb1
Time taken: 0.04 seconds, Fetched: 1 row(s)
hive>

查看tb1表结构

hive> desc tb1;
OK
id                      int                                         
name                    string                                      
price                   double                                      
Time taken: 0.138 seconds, Fetched: 3 row(s)
hive>

11.3.5 插入数据

（1）显式字段插入

 hive> insert into users(id,name) values(1,'aa');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810094706_d75cb718-9742-4b25-864c-44a63c27d064
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502371586068_0001, Tracking URL = http://node1:8088/proxy/application_1502371586068_0001/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-10 09:47:34,602 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:47:50,211 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.36 sec
MapReduce Total cumulative CPU time: 2 seconds 360 msec
Ended Job = job_1502371586068_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-47-06_012_378591815767107640-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.36 sec   HDFS Read: 4225 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 360 msec
OK
Time taken: 46.216 seconds
hive>

（2）隐式字段插入

hive> insert into users values(2,'bb');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810095011_cf3ab969-c1bb-459b-a1eb-f9b411ec0614
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502371586068_0002, Tracking URL = http://node1:8088/proxy/application_1502371586068_0002/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-10 09:50:28,186 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:50:41,103 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.34 sec
MapReduce Total cumulative CPU time: 2 seconds 340 msec
Ended Job = job_1502371586068_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-50-11_441_8940366110116441398-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.34 sec   HDFS Read: 4228 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 340 msec
OK
Time taken: 32.393 seconds
hive>

（3）插入多条记录

hive> insert into users values(3,'cc'),(4,'dd'),(5,'ee');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810095507_48fc0ce3-7789-4427-b35c-3809586f208f
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1502371586068_0003, Tracking URL = http://node1:8088/proxy/application_1502371586068_0003/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-08-10 09:55:24,363 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:55:36,294 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.35 sec
MapReduce Total cumulative CPU time: 2 seconds 350 msec
Ended Job = job_1502371586068_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://cetc/user/hive/warehouse/users/.hive-staging_hive_2017-08-10_09-55-07_839_6570192849374755881-1/-ext-10000
Loading data to table default.users
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 2.35 sec   HDFS Read: 4238 HDFS Write: 138 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 350 msec
OK
Time taken: 31.315 seconds
hive>

可见，Hive每次向表中插入一条记录，都会转换为MapReduce程序，效率低下。

11.3.6 简单查询

（1）字段查询

hive> select id,name from users;
OK
1   aa                  
2   bb                  
3   cc                  
4   dd                  
5   ee                  
Time taken: 0.224 seconds, Fetched: 5 row(s)
hive>

（2）条件查看

hive> select * from users where id=1;
OK
1   aa                  
Time taken: 0.399 seconds, Fetched: 1 row(s)
hive> select name from users where name like '%a';
OK
aa                  
Time taken: 0.214 seconds, Fetched: 1 row(s)
hive>

（3）统计查询

hive> select count(*) from users;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20170810095858_fa356df4-8b18-4a5b-835a-643427dd99f6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1502371586068_0004, Tracking URL = http://node1:8088/proxy/application_1502371586068_0004/
Kill Command = /opt/hadoop-2.7.3/bin/hadoop job  -kill job_1502371586068_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-10 09:59:21,494 Stage-1 map = 0%,  reduce = 0%
2017-08-10 09:59:31,484 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.02 sec
2017-08-10 09:59:43,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.11 sec
MapReduce Total cumulative CPU time: 4 seconds 110 msec
Ended Job = job_1502371586068_0004
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.11 sec   HDFS Read: 8068 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 110 msec
OK
5
Time taken: 46.061 seconds, Fetched: 1 row(s)
hive>

11.3.7 不支持删除修改记录

hive> delete from users where id=1;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive> update users set name='xx' where id=1;
FAILED: SemanticException [Error 10294]: Attempt to do update or delete using transaction manager that does not support these operations.
hive>

表中数据不支持删除修改操作，但是可以截断表和删除整张表。

hive> truncate table users;
OK
Time taken: 0.273 seconds
hive> select * from users;
OK
Time taken: 0.243 seconds
hive>

hive> drop table users;
OK
Time taken: 2.025 seconds
hive>

最后

以上就是淡定乌冬面最近收集整理的关于Hadoop基础教程-第11章 Hive：SQL on Hadoop（11.3 Hive 快速入门）的全部内容，更多相关Hadoop基础教程-第11章内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：Hadoop基础教程
浏览次数：92 次浏览
发布日期：2023-10-20 02:50:19
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_26_f5_14_z_26_4.html

Hadoop基础教程-第11章 Hive：SQL on Hadoop（11.3 Hive 快速入门）

第11章 Hive：SQL on Hadoop

11.3 Hive快速入门

11.3.1 HQL介绍

11.3.2 进入Hive Shell

11.3.3 Hive数据库

11.3.4 创建表

11.3.4 查看表

11.3.5 插入数据

11.3.6 简单查询

11.3.7 不支持删除修改记录

最后

评论列表共有 0 条评论

发表评论取消回复

Hadoop基础教程-第11章 Hive：SQL on Hadoop（11.3 Hive 快速入门）

第11章 Hive：SQL on Hadoop

11.3 Hive快速入门

11.3.1 HQL介绍

11.3.2 进入Hive Shell

11.3.3 Hive数据库

11.3.4 创建表

11.3.4 查看表

11.3.5 插入数据

11.3.6 简单查询

11.3.7 不支持删除修改记录

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复