我是靠谱客的博主 迷你蜻蜓,最近开发中收集的这篇文章主要介绍Hive SQL操作与函数自定义(一)1 DDL2 Statistics3 Indexes4 Archiving5 DML6 import/export7 explain plan8 Select,觉得挺不错的,现在分享给大家,希望可以做个参考。
概述
http://blog.csdn.net/u013980127/article/details/52604882
与传统SQL一致,并且概念简单的,本文没有详细说明或完全没有说明。另外,本文也没有按照官文一字不漏地翻译,只是提炼本人认为需要的。
1 DDL
1.1 Create/Drop/Alter/Use Database
1.1.1 Create Database
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
注:DATABASE与SCHEMA用途相同
1.1.2 Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
注:默认RESTRICT,使用CASCADE可删除含表的数据库。
1.1.3 Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);
-- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;
-- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
1.1.4 Use Database
USE database_name;
USE DEFAULT;
-- 获取当前数据库
SELECT current_database()
1.2 Create/Drop/Truncate Table
1.2.1 Create Table
1.2.1.1 语法
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
-- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...)
-- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
-- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
-- (Note: Available in Hive 0.6.0 and later)
[AS select_statement];
-- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type
-- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY
-- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP
-- (Note: Available in Hive 0.8.0 and later)
| DECIMAL
-- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale)
-- (Note: Available in Hive 0.13.0 and later)
| DATE
-- (Note: Available in Hive 0.12.0 and later)
| VARCHAR
-- (Note: Available in Hive 0.12.0 and later)
| CHAR
-- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY
map_type
: MAP
1.2.1.2 Row Format, Storage Format, and SerDe
可以使用自定义的SerDe或者Hive自带的SerDe。如果没有指定ROW FORMAT或指定了ROW FORMAT DELIMITED时,会使用自带的SerDe。
‘hive.default.fileformat’设置默认存储格式。
参考
[Hive中的InputFormat、OutputFormat与SerDe]
(https://www.coder4.com/archives/4031)
1.2.1.3 Partitioned Table
通过PARTITIONED BY创建。可以指定一个或多个分区列,用CLUSTERED BY columns分桶,通过SORT BY排序桶中的数据。
-- 示例
id
int,
date
date,
name
varchar
create table table_name (
id
int,
dtDontQuery
string,
name
string
)
partitioned by (date string)
1.2.1.4 External Tables
LOCATION指定数据所在位置。删除外部表时,表中数据不会被删除。
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User',
country STRING COMMENT 'country of origination')
COMMENT 'This is the staging page view table'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '