我是靠谱客的博主 痴情项链,最近开发中收集的这篇文章主要介绍Hive 常用的DDL语句大全 建库建表 详细1.创建库2.查看库3.删除库4.切换库5.创建表6.查看表 7.设置参数8.查看sql的执行计划,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

参考书籍:《Hive编程指南》

1.创建库

(1)语法结构

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]       //关于数据块的描述
  [LOCATION hdfs_path]          //指定数据库在HDFS上的存储位置
  [WITH DBPROPERTIES (property_name=property_value, ...)];    //指定数据块属性

默认地址:/user/hive/warehouse/db_name.db/table_name/partition_name/…

 

(2)创建库的方式

# 创建普通的数据库

create database t1;

# 创建库的时候检查存与否

create database if not exists t1;

# 创建库的时候带注释

create database if not exists t2 comment 'learning hive';

 

2.查看库

(1)查看库的方式

# 查看有哪些数据库

show databases;
+-----------------------+
| database_name
|
+-----------------------+
| default
|
| information_schema
|
| sys
|
+-----------------------+
--使用正则表达式匹配来筛选需要的数据库名
show databases like 't*';
+-----------------------+
| database_name
|
+-----------------------+
| t1
|
| t2
|
+-----------------------+
Hive会为每个数据库创建一个目录,数据库中的表将会以这个数据库目录的子目录形式存储。
default数据库中的表是个例外,因为这个数据库本身没有自己的目录。
数据库所在的目录位于参数hive.metastore.warehouse.dir所指定的顶层目录之后。

 

# 显示数据库的详细属性信息

desc database extended t2;
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+
| t2 | learning hive | hdfs://server1:8020/warehouse/tablespace/managed/hive/t2.db | hdfs | USER | |
+----------+----------------+----------------------------------------------------+-------------+-------------+-------------+


# 查看正在使用哪个库

select current_database();
+----------+
| _c0
|
+----------+
| default
|
+----------+

# 查看创建库的详细语句

show create database t2;
+----------------------------------------------------+
| createdb_stmt
|
+----------------------------------------------------+
| CREATE DATABASE `t2`
|
| COMMENT
|
| 'learning hive'
|
| LOCATION
|
| 'hdfs://server1:8020/warehouse/tablespace/managed/hive/t2.db' |
+----------------------------------------------------+

 

3.删除库

(1)删除语法

drop database t1;
drop database if exists t1;

# 默认情况下hive不允许删除包含表的数据库。用户要么先删除数据库中的表,然后再删除数据库,要么在删除命令的最后面加上关键字cascade,这样就能直接删除库。

drop database if exists t1 cascade;
drop database  t1 cascade;

 

4.切换库

use database_name;

 

5.创建表

(1)语法

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...)
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [ROW FORMAT row_format]
  [STORED AS file_format]
  [LOCATION hdfs_path]

(2)关键字说明

•EXTERNAL 创建一个外部表,在建表的同时指定指向实际数据的路径(LOCATION)
•LIKE 创建表时copy已有的表结构,不复制数据
•COMMENT 注释
•PARTITIONED BY 指定分区字段
•ROW FORMAT
  DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
    MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
    | SERDE serde_name [WITH SERDEPROPERTIES
    (property_name=property_value, property_name=property_value, ...)]
行设置
•STORED AS
//数据存储格式设置
  SEQUENCEFILE //序列化文件
  | TEXTFILE
//普通的文本文件格式
  | RCFILE   //行列存储相结合的文件
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname //自定义文件格式
如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE 。
•LOCATION 指定表在HDFS的存储路径
建议:
如果一份数据已经存储在HDFS上,并且要被多个用户或者客户端使用,最好创建外部表。

 

(3)创建表以及操作

# 创建内部表

create table student_test
(
id int, 
name string, 
sex string, 
age int) 
row format delimited fields terminated by ",";
默认创建的表都是内部表(也叫管理表),Hive会控制着数据的生命周期,删除管理表时,也会删除表的数据。
管理表不方便和其他工作共享数据。

# 创建外部表(external)

create external table student_ext
(
id int, 
name string, 
sex string, 
age int) 
row format delimited fields terminated by "," 
location "/hive/student_ext";
外部表在删除该表时并不会删除掉这份数据,不过描述表的元数据信息会被删除掉

# 创建分区表

create external table student_part
(
id int, 
name string, 
sex string, 
age int)
partitioned by (city string)
row format delimited fields terminated by ","
location "/hive/student_part";
--添加新的分区
alter table student_part add partition (city="beijing");

# 分桶表

create external table student_bck
(
id int, 
name string, 
sex string, 
age int,
department string)
clustered by (id) 
sorted by (id asc, name desc) into 4 buckets
row format delimited fields terminated by ","
location "/hive/student_bck";

 

# like 拷贝一张已经存在的表的表模式

create table student_like like student;
--外部表
create external table student_like_ext like student
location '//hive/student_like_ext'
注意:
如果语句中省略掉external关键字而且原始表是外部表的话,那么生成的新表也将是外部表。
如果语句中省略掉external关键字而且原始表是管理表的话,那么生成的新表也将是管理表。
如果语句中有external关键字而且原始表是管理表的话,那么生成的新表将是外部表。

 

6.查看表

(1)查看表列表

# 查看当前使用的数据库中有哪些表

show tables;
+-------------------------+
| tab_name 
 |
+-------------------------+
| call_center
|
+-------------------------+

# 查看非当前使用的数据库中有哪些表

show tables in t2;
+-------------------------+
| tab_name 
  |
+-------------------------+
| catalog_page
|
+-------------------------+

# 查看数据库中以xxx开头的表

show tables like 'call_*';
+-------------------------+
| tab_name   
|
+-------------------------+
| call_center
|
+-------------------------+

 

(2)查看表信息

desc call_center;
+--------------------+------------+----------+
|
col_name
| data_type
| comment
|
+--------------------+------------+----------+
| cc_call_center_sk
| int
|
|
| cc_call_center_id
| string
|
|
| cc_rec_start_date
| string
|
|
| cc_rec_end_date
| string
|
|
| cc_closed_date_sk
| int
|
|
| cc_open_date_sk
| int
|
|
| cc_name
| string
|
|
| cc_class
| string
|
|
| cc_employees
| int
|
|
| cc_sq_ft
| int
|
|
| cc_hours
| string
|
|
| cc_manager
| string
|
|
| cc_mkt_id
| int
|
|
| cc_mkt_class
| string
|
|
| cc_mkt_desc
| string
|
|
| cc_market_manager
| string
|
|
| cc_division
| int
|
|
| cc_division_name
| string
|
|
| cc_company
| int
|
|
| cc_company_name
| string
|
|
| cc_street_number
| string
|
|
| cc_street_name
| string
|
|
| cc_street_type
| string
|
|
| cc_suite_number
| string
|
|
| cc_city
| string
|
|
| cc_county
| string
|
|
| cc_state
| string
|
|
| cc_zip
| string
|
|
| cc_country
| string
|
|
| cc_gmt_offset
| float
|
|
| cc_tax_percentage
| float
|
|
+--------------------+------------+----------+

# 查看表的详细信息(格式不友好)

desc extended call_center;

# 查看表的详细信息(格式友好)包含文件数 分区数等等,查看表的文件个数(用于分析小文件)

desc formatted call_center;
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
|
col_name
|
data_type
|
comment
|
+-------------------------------+----------------------------------------------------+----------------------------------------------------+
| # col_name
| data_type
| comment
|
| cc_call_center_sk
| int
|
|
| cc_call_center_id
| string
|
|
| cc_rec_start_date
| string
|
|
| cc_rec_end_date
| string
|
|
| cc_closed_date_sk
| int
|
|
| cc_open_date_sk
| int
|
|
| cc_name
| string
|
|
| cc_class
| string
|
|
| cc_employees
| int
|
|
| cc_sq_ft
| int
|
|
| cc_hours
| string
|
|
| cc_manager
| string
|
|
| cc_mkt_id
| int
|
|
| cc_mkt_class
| string
|
|
| cc_mkt_desc
| string
|
|
| cc_market_manager
| string
|
|
| cc_division
| int
|
|
| cc_division_name
| string
|
|
| cc_company
| int
|
|
| cc_company_name
| string
|
|
| cc_street_number
| string
|
|
| cc_street_name
| string
|
|
| cc_street_type
| string
|
|
| cc_suite_number
| string
|
|
| cc_city
| string
|
|
| cc_county
| string
|
|
| cc_state
| string
|
|
| cc_zip
| string
|
|
| cc_country
| string
|
|
| cc_gmt_offset
| float
|
|
| cc_tax_percentage
| float
|
|
|
| NULL
| NULL
|
| # Detailed Table Information
| NULL
| NULL
|
| Database:
| tpcds_orc_hive_8000
| NULL
|
| OwnerType:
| USER
| NULL
|
| Owner:
| hdfs
| NULL
|
| CreateTime:
| Wed Dec 09 03:17:21 CST 2020
| NULL
|
| LastAccessTime:
| UNKNOWN
| NULL
|
| Retention:
| 0
| NULL
|
| Location:
| hdfs://server1:8020/warehouse/tablespace/managed/hive/tpcds_orc_hive_8000.db/call_center | NULL
|
| Table Type:
| MANAGED_TABLE
| NULL
|
| Table Parameters:
| NULL
| NULL
|
|
| COLUMN_STATS_ACCURATE
| {"BASIC_STATS":"true","COLUMN_STATS":{"cc_call_center_id":"true","cc_call_center_sk":"true","cc_city":"true","cc_class":"true","cc_closed_date_sk":"true","cc_company":"true","cc_company_name":"true","cc_country":"true","cc_county":"true","cc_division":"true","cc_division_name":"true","cc_employees":"true","cc_gmt_offset":"true","cc_hours":"true","cc_manager":"true","cc_market_manager":"true","cc_mkt_class":"true","cc_mkt_desc":"true","cc_mkt_id":"true","cc_name":"true","cc_open_date_sk":"true","cc_rec_end_date":"true","cc_rec_start_date":"true","cc_sq_ft":"true","cc_state":"true","cc_street_name":"true","cc_street_number":"true","cc_street_type":"true","cc_suite_number":"true","cc_tax_percentage":"true","cc_zip":"true"}} |
|
| bucketing_version
| 2
|
|
| numFiles
| 1
|
|
| numRows
| 10
|
|
| rawDataSize
| 0
|
|
| totalSize
| 5899
|
|
| transactional
| true
|
|
| transactional_properties
| default
|
|
| transient_lastDdlTime
| 1607455918
|
|
| NULL
| NULL
|
| # Storage Information
| NULL
| NULL
|
| SerDe Library:
| org.apache.hadoop.hive.ql.io.orc.OrcSerde
| NULL
|
| InputFormat:
| org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
| NULL
|
| OutputFormat:
| org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
| NULL
|
| Compressed:
| No
| NULL
|
| Num Buckets:
| -1
| NULL
|
| Bucket Columns:
| []
| NULL
|
| Sort Columns:
| []
| NULL
|
| Storage Desc Params:
| NULL
| NULL
|
|
| serialization.format
| 1
|
+-------------------------------+----------------------------------------------------+----------------------------------------------------+

# 查看分区信息

show partitions time_dim;

 

(3)查看建表语句

# 可以看到建表语句的详细信息

show create table call_center;
+----------------------------------------------------+
|
createtab_stmt
|
+----------------------------------------------------+
| CREATE TABLE `call_center`(
|
|
`cc_call_center_sk` int,
|
|
`cc_call_center_id` string,
|
|
`cc_rec_start_date` string,
|
|
`cc_rec_end_date` string,
|
|
`cc_closed_date_sk` int,
|
|
`cc_open_date_sk` int,
|
|
`cc_name` string,
|
|
`cc_class` string,
|
|
`cc_employees` int,
|
|
`cc_sq_ft` int,
|
|
`cc_hours` string,
|
|
`cc_manager` string,
|
|
`cc_mkt_id` int,
|
|
`cc_mkt_class` string,
|
|
`cc_mkt_desc` string,
|
|
`cc_market_manager` string,
|
|
`cc_division` int,
|
|
`cc_division_name` string,
|
|
`cc_company` int,
|
|
`cc_company_name` string,
|
|
`cc_street_number` string,
|
|
`cc_street_name` string,
|
|
`cc_street_type` string,
|
|
`cc_suite_number` string,
|
|
`cc_city` string,
|
|
`cc_county` string,
|
|
`cc_state` string,
|
|
`cc_zip` string,
|
|
`cc_country` string,
|
|
`cc_gmt_offset` float,
|
|
`cc_tax_percentage` float)
|
| ROW FORMAT SERDE
|
|
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
|
| STORED AS INPUTFORMAT
|
|
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
|
| OUTPUTFORMAT
|
|
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION
|
|
'hdfs://server1:8020/warehouse/tablespace/managed/hive/tpcds_orc_hive_8000.db/call_center' |
| TBLPROPERTIES (
|
|
'bucketing_version'='2',
|
|
'transactional'='true',
|
|
'transactional_properties'='default',
|
|
'transient_lastDdlTime'='1607455918')
|
+----------------------------------------------------+

 

7.设置参数

(1)查看当前参数

set hive.auto.convert.join;
+------------------------------+
| set
|
+------------------------------+
| hive.auto.convert.join=true
|
+------------------------------+

(2)设置参数

set hive.auto.convert.join = false;

 

8.查看sql的执行计划

explain select * from test_table;

# 查看详细的执行计划(信息会很多)

explain extended select * from test_table;

 

最后

以上就是痴情项链为你收集整理的Hive 常用的DDL语句大全 建库建表 详细1.创建库2.查看库3.删除库4.切换库5.创建表6.查看表 7.设置参数8.查看sql的执行计划的全部内容,希望文章能够帮你解决Hive 常用的DDL语句大全 建库建表 详细1.创建库2.查看库3.删除库4.切换库5.创建表6.查看表 7.设置参数8.查看sql的执行计划所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(43)

评论列表共有 0 条评论

立即
投稿
返回
顶部