我是靠谱客的博主 痴情项链,最近开发中收集的这篇文章主要介绍Hive 常用的DDL语句大全 建库建表 详细1.创建库2.查看库3.删除库4.切换库5.创建表6.查看表 7.设置参数8.查看sql的执行计划,觉得挺不错的,现在分享给大家,希望可以做个参考。
概述
参考书籍:《Hive编程指南》
1.创建库
(1)语法结构
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment] //关于数据块的描述
[LOCATION hdfs_path] //指定数据库在HDFS上的存储位置
[WITH DBPROPERTIES (property_name=property_value, ...)]; //指定数据块属性
默认地址:/user/hive/warehouse/db_name.db/table_name/partition_name/…
(2)创建库的方式
# 创建普通的数据库
create database t1;
# 创建库的时候检查存与否
create database if not exists t1;
# 创建库的时候带注释
create database if not exists t2 comment 'learning hive';
2.查看库
(1)查看库的方式
# 查看有哪些数据库
show databases;
+-----------------------+
| database_name
|
+-----------------------+
| default
|
| information_schema
|
| sys
|
+-----------------------+
--使用正则表达式匹配来筛选需要的数据库名
show databases like 't*';
+-----------------------+
| database_name
|
+-----------------------+
| t1
|
| t2
|
+-----------------------+
Hive会为每个数据库创建一个目录,数据库中的表将会以这个数据库目录的子目录形式存储。
default数据库中的表是个例外,因为这个数据库本身没有自己的目录。
数据库所在的目录位于参数hive.metastore.warehouse.dir所指定的顶层目录之后。
# 显示数据库的详细属性信息
desc database extended t2;
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
| t2 | learning hive | hdfs://server1:8020/warehouse/tablespace/managed/hive/t2.db | hdfs | USER | |
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
# 查看正在使用哪个库
select current_database();
+----------+
| _c0
|
+----------+
| default
|
+----------+
# 查看创建库的详细语句
show create database t2;
+----------------------------------------------------+
| createdb_stmt
|
+----------------------------------------------------+
| CREATE DATABASE `t2`
|
| COMMENT
|
| 'learning hive'
|
| LOCATION
|
| 'hdfs://server1:8020/warehouse/tablespace/managed/hive/t2.db' |
+----------------------------------------------------+
3.删除库
(1)删除语法
drop database t1;
drop database if exists t1;
# 默认情况下hive不允许删除包含表的数据库。用户要么先删除数据库中的表,然后再删除数据库,要么在删除命令的最后面加上关键字cascade,这样就能直接删除库。
drop database if exists t1 cascade;
drop database t1 cascade;
4.切换库
use database_name;
5.创建表
(1)语法
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
(2)关键字说明
•EXTERNAL 创建一个外部表,在建表的同时指定指向实际数据的路径(LOCATION)
•LIKE 创建表时copy已有的表结构,不复制数据
•COMMENT 注释
•PARTITIONED BY 指定分区字段
•ROW FORMAT
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES
(property_name=property_value, property_name=property_value, ...)]
行设置
•STORED AS
//数据存储格式设置
SEQUENCEFILE //序列化文件
| TEXTFILE
//普通的文本文件格式
| RCFILE //行列存储相结合的文件
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定义文件格式
如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE 。
•LOCATION 指定表在HDFS的存储路径
建议:
如果一份数据已经存储在HDFS上,并且要被多个用户或者客户端使用,最好创建外部表。
(3)创建表以及操作
# 创建内部表
create table student_test
(
id int,
name string,
sex string,
age int)
row format delimited fields terminated by ",";
默认创建的表都是内部表(也叫管理表),Hive会控制着数据的生命周期,删除管理表时,也会删除表的数据。
管理表不方便和其他工作共享数据。
# 创建外部表(external)
create external table student_ext
(
id int,
name string,
sex string,
age int)
row format delimited fields terminated by ","
location "/hive/student_ext";
外部表在删除该表时并不会删除掉这份数据,不过描述表的元数据信息会被删除掉
# 创建分区表
create external table student_part
(
id int,
name string,
sex string,
age int)
partitioned by (city string)
row format delimited fields terminated by ","
location "/hive/student_part";
--添加新的分区
alter table student_part add partition (city="beijing");
# 分桶表
create external table student_bck
(
id int,
name string,
sex string,
age int,
department string)
clustered by (id)
sorted by (id asc, name desc) into 4 buckets
row format delimited fields terminated by ","
location "/hive/student_bck";
# like 拷贝一张已经存在的表的表模式
create table student_like like student;
--外部表
create external table student_like_ext like student
location '//hive/student_like_ext'
注意:
如果语句中省略掉external关键字而且原始表是外部表的话,那么生成的新表也将是外部表。
如果语句中省略掉external关键字而且原始表是管理表的话,那么生成的新表也将是管理表。
如果语句中有external关键字而且原始表是管理表的话,那么生成的新表将是外部表。
6.查看表
(1)查看表列表
# 查看当前使用的数据库中有哪些表
show tables;
+-------------------------+
| tab_name
|
+-------------------------+
| call_center
|
+-------------------------+
# 查看非当前使用的数据库中有哪些表
show tables in t2;
+-------------------------+
| tab_name
|
+-------------------------+
| catalog_page
|
+-------------------------+
# 查看数据库中以xxx开头的表
show tables like 'call_*';
+-------------------------+
| tab_name
|
+-------------------------+
| call_center
|
+-------------------------+
(2)查看表信息
desc call_center;
+--------------------+------------+----------+
|
col_name
| data_type
| comment
|
+--------------------+------------+----------+
| cc_call_center_sk
| int
|
|
| cc_call_center_id
| string
|
|
| cc_rec_start_date
| string
|
|
| cc_rec_end_date
| string
|
|
| cc_closed_date_sk
| int
|
|
| cc_open_date_sk
| int
|
|
| cc_name
| string
|
|
| cc_class
| string
|
|
| cc_employees
| int
|
|
| cc_sq_ft
| int
|
|
| cc_hours
| string
|
|
| cc_manager
| string
|
|
| cc_mkt_id
| int
|
|
| cc_mkt_class
| string
|
|
| cc_mkt_desc
| string
|
|
| cc_market_manager
| string
|
|
| cc_division
| int
|
|
| cc_division_name
| string
|
|
| cc_company
| int
|
|
| cc_company_name
| string
|
|
| cc_street_number
| string
|
|
| cc_street_name
| string
|
|
| cc_street_type
| string
|
|
| cc_suite_number
| string
|
|
| cc_city
| string
|
|
| cc_county
| string
|
|
| cc_state
| string
|
|
| cc_zip
| string
|
|
| cc_country
| string
|
|
| cc_gmt_offset
| float
|
|
| cc_tax_percentage
| float
|
|
+--------------------+------------+----------+
# 查看表的详细信息(格式不友好)
desc extended call_center;
# 查看表的详细信息(格式友好)包含文件数 分区数等等,查看表的文件个数(用于分析小文件)
desc formatted call_center;
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
|
col_name
|
data_type
|
comment
|
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
| # col_name
| data_type
| comment
|
| cc_call_center_sk
| int
|
|
| cc_call_center_id
| string
|
|
| cc_rec_start_date
| string
|
|
| cc_rec_end_date
| string
|
|
| cc_closed_date_sk
| int
|
|
| cc_open_date_sk
| int
|
|
| cc_name
| string
|
|
| cc_class
| string
|
|
| cc_employees
| int
|
|
| cc_sq_ft
| int
|
|
| cc_hours
| string
|
|
| cc_manager
| string
|
|
| cc_mkt_id
| int
|
|
| cc_mkt_class
| string
|
|
| cc_mkt_desc
| string
|
|
| cc_market_manager
| string
|
|
| cc_division
| int
|
|
| cc_division_name
| string
|
|
| cc_company
| int
|
|
| cc_company_name
| string
|
|
| cc_street_number
| string
|
|
| cc_street_name
| string
|
|
| cc_street_type
| string
|
|
| cc_suite_number
| string
|
|
| cc_city
| string
|
|
| cc_county
| string
|
|
| cc_state
| string
|
|
| cc_zip
| string
|
|
| cc_country
| string
|
|
| cc_gmt_offset
| float
|
|
| cc_tax_percentage
| float
|
|
|
| NULL
| NULL
|
| # Detailed Table Information
| NULL
| NULL
|
| Database:
| tpcds_orc_hive_8000
| NULL
|
| OwnerType:
| USER
| NULL
|
| Owner:
| hdfs
| NULL
|
| CreateTime:
| Wed Dec 09 03:17:21 CST 2020
| NULL
|
| LastAccessTime:
| UNKNOWN
| NULL
|
| Retention:
| 0
| NULL
|
| Location:
| hdfs://server1:8020/warehouse/tablespace/managed/hive/tpcds_orc_hive_8000.db/call_center | NULL
|
| Table Type:
| MANAGED_TABLE
| NULL
|
| Table Parameters:
| NULL
| NULL
|
|
| COLUMN_STATS_ACCURATE
| {"BASIC_STATS":"true","COLUMN_STATS":{"cc_call_center_id":"true","cc_call_center_sk":"true","cc_city":"true","cc_class":"true","cc_closed_date_sk":"true","cc_company":"true","cc_company_name":"true","cc_country":"true","cc_county":"true","cc_division":"true","cc_division_name":"true","cc_employees":"true","cc_gmt_offset":"true","cc_hours":"true","cc_manager":"true","cc_market_manager":"true","cc_mkt_class":"true","cc_mkt_desc":"true","cc_mkt_id":"true","cc_name":"true","cc_open_date_sk":"true","cc_rec_end_date":"true","cc_rec_start_date":"true","cc_sq_ft":"true","cc_state":"true","cc_street_name":"true","cc_street_number":"true","cc_street_type":"true","cc_suite_number":"true","cc_tax_percentage":"true","cc_zip":"true"}} |
|
| bucketing_version
| 2
|
|
| numFiles
| 1
|
|
| numRows
| 10
|
|
| rawDataSize
| 0
|
|
| totalSize
| 5899
|
|
| transactional
| true
|
|
| transactional_properties
| default
|
|
| transient_lastDdlTime
| 1607455918
|
|
| NULL
| NULL
|
| # Storage Information
| NULL
| NULL
|
| SerDe Library:
| org.apache.hadoop.hive.ql.io.orc.OrcSerde
| NULL
|
| InputFormat:
| org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
| NULL
|
| OutputFormat:
| org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
| NULL
|
| Compressed:
| No
| NULL
|
| Num Buckets:
| -1
| NULL
|
| Bucket Columns:
| []
| NULL
|
| Sort Columns:
| []
| NULL
|
| Storage Desc Params:
| NULL
| NULL
|
|
| serialization.format
| 1
|
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
# 查看分区信息
show partitions time_dim;
(3)查看建表语句
# 可以看到建表语句的详细信息
show create table call_center;
+----------------------------------------------------+
|
createtab_stmt
|
+----------------------------------------------------+
| CREATE TABLE `call_center`(
|
|
`cc_call_center_sk` int,
|
|
`cc_call_center_id` string,
|
|
`cc_rec_start_date` string,
|
|
`cc_rec_end_date` string,
|
|
`cc_closed_date_sk` int,
|
|
`cc_open_date_sk` int,
|
|
`cc_name` string,
|
|
`cc_class` string,
|
|
`cc_employees` int,
|
|
`cc_sq_ft` int,
|
|
`cc_hours` string,
|
|
`cc_manager` string,
|
|
`cc_mkt_id` int,
|
|
`cc_mkt_class` string,
|
|
`cc_mkt_desc` string,
|
|
`cc_market_manager` string,
|
|
`cc_division` int,
|
|
`cc_division_name` string,
|
|
`cc_company` int,
|
|
`cc_company_name` string,
|
|
`cc_street_number` string,
|
|
`cc_street_name` string,
|
|
`cc_street_type` string,
|
|
`cc_suite_number` string,
|
|
`cc_city` string,
|
|
`cc_county` string,
|
|
`cc_state` string,
|
|
`cc_zip` string,
|
|
`cc_country` string,
|
|
`cc_gmt_offset` float,
|
|
`cc_tax_percentage` float)
|
| ROW FORMAT SERDE
|
|
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
|
| STORED AS INPUTFORMAT
|
|
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
|
| OUTPUTFORMAT
|
|
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION
|
|
'hdfs://server1:8020/warehouse/tablespace/managed/hive/tpcds_orc_hive_8000.db/call_center' |
| TBLPROPERTIES (
|
|
'bucketing_version'='2',
|
|
'transactional'='true',
|
|
'transactional_properties'='default',
|
|
'transient_lastDdlTime'='1607455918')
|
+----------------------------------------------------+
7.设置参数
(1)查看当前参数
set hive.auto.convert.join;
+------------------------------+
| set
|
+------------------------------+
| hive.auto.convert.join=true
|
+------------------------------+
(2)设置参数
set hive.auto.convert.join = false;
8.查看sql的执行计划
explain select * from test_table;
# 查看详细的执行计划(信息会很多)
explain extended select * from test_table;
最后
以上就是痴情项链为你收集整理的Hive 常用的DDL语句大全 建库建表 详细1.创建库2.查看库3.删除库4.切换库5.创建表6.查看表 7.设置参数8.查看sql的执行计划的全部内容,希望文章能够帮你解决Hive 常用的DDL语句大全 建库建表 详细1.创建库2.查看库3.删除库4.切换库5.创建表6.查看表 7.设置参数8.查看sql的执行计划所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复