我是靠谱客的博主 心灵美小甜瓜,最近开发中收集的这篇文章主要介绍Hive Transaction(Hive 事务管理)常见问题,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

Hive 事务在 Hive 3 得到增强。

hive-site.xml 配置

<property>
<name>hive.txn.manager</name>
<value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
<description>
Set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive
transactions, which also requires appropriate settings for hive.compactor.initiator.on,
hive.compactor.worker.threads, hive.support.concurrency (true),
and hive.exec.dynamic.partition.mode (nonstrict).
The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides
no transactions.
</description>
</property>
<property>
<name>hive.support.concurrency</name>
<value>true</value>
</property>
<property>
<name>hive.compactor.initiator.on</name>
<value>true</value>
<description>
Whether to run the initiator and cleaner threads on this metastore instance or not.
Set this to true on one instance of the Thrift metastore service as part of turning
on Hive transactions. For a complete list of parameters required for turning on
transactions, see hive.txn.manager.
</description>
</property>
<property>
<name>hive.compactor.worker.threads</name>
<value>2</value>
<description>
How many compactor worker threads to run on this metastore instance. Set this to a
positive number on one or more instances of the Thrift metastore service as part of
turning on Hive transactions. For a complete list of parameters required for turning
on transactions, see hive.txn.manager.
Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions
themselves. Increasing the number of worker threads will decrease the time it takes
tables or partitions to be compacted once they are determined to need compaction.
It will also increase the background load on the Hadoop cluster as more MapReduce jobs
will be running in the background.
</description>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
<description>
In strict mode, the user must specify at least one static partition
in case the user accidentally overwrites all partitions.
In nonstrict mode all partitions are allowed to be dynamic.
</description>
</property>

Hive 创建的表自动是事务表的配置。

<property>
<name>metastore.strict.managed.tables</name>
<value>false</value>
<description>
Whether strict managed tables mode is enabled. With this mode enabled, only transactional tables (both full and insert-only) are allowed to be created as managed tables
</description>
</property>
<property>
<name>hive.create.as.insert.only</name>
<value>false</value>
<description>
Whether the eligible tables should be created as ACID insert-only by default. Does not apply to external tables, the ones using storage handlers, etc.
</description>
</property>
<property>
<name>metastore.create.as.acid</name>
<value>false</value>
<description>
Whether the eligible tables should be created as full ACID by default. Does not apply to external tables, the ones using storage handlers, etc.
</description>
</property>

测试

创建事务表

create table t1(c1 int,c2 int) stored as orc tblproperties('transactional'='true');

执行以下操作。执行之后,可以看到每个操作在表的目录下生成相应的 delta 目录。

insert into t1 values(1,1),(2,2),(3,3);
insert into t1 values(4,4);
insert into t1 values(5,5);
insert into t1 values(6,6);
insert into t1 values(7,7);
insert into t1 values(8,8);
insert into t1 values(9,9);
insert into t1 values(10,10);
insert into t1 values(11,11);
insert into t1 values(12,12);
insert into t1 values(13,13);
delete from t1 where c1=13;
insert into t1 values(13,14);
delete from t1 where c1=13;
insert into t1 values(13,15);

insert_only 事务

insert_only 事务部要求表必须是 orc 格式。
t2 表仅支持 insert,不支持 delete, update。
insert 语句成功, delete 失败。

create table t2(c1 int,c2 int) stored as orc tblproperties('transactional'='true','transactional_properties'='insert_only');
insert into t2 values(1,1),(2,2),(3,3);
delete from t2 where c1=3;

修改现有表支持事务

全部事务支持

表 t3 从创建的时候没有支持事务,用 alter table 支持事务,之后可以执行 insert, delete 操作。

create table t3(c1 int,c2 int) stored as orc;
alter table t3 set
tblproperties('transactional'='true','transactional_properties'='default');
insert into t3 values(1,1),(2,2),(3,3);
delete from t3 where c1=3;

insert_only 事务支持

create table t4(c1 int,c2 int) stored as orc;
alter table t4 set
tblproperties('transactional'='true','transactional_properties'='insert_only');
insert into t4 values(1,1),(2,2),(3,3);

Hive 官方文档https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions

常见问题

问题原因总结:

  1. 一个表,可以同时运行多个 SQL 进行 insert 和 select 操作。但是不允许同时运行多个 SQL 进行 UPDATE, DELETE 或者 MERGE操作, 因为他们都会生成 delete_delta 目录。如两个 SQL,它们同时 update 不同的记录也会报错。
  2. SQL 解析阶段获取 一个表的 write_id list,但是在执行之前获取锁,在执行之后释放锁,导致SQL 解析和执行之前会出现并发错误。

问题列表

  1. 作业报错
    如果在执行的时候,发生了 compaction,compaction 之后删除了原来的文件,导致抛出 FileNotFoundException: File does not exist:.
  2. 2 个会话同时执行 insert overwrite 抛出 LockException
  • 创建表
create table t1(c1 int) stored as orc tblproperties('transactional'='true');

在两个窗口里分别用 beeline 连接 hiveserver。

  • 在会话1,执行以下命令,函数 timesleep 会 sleep 10秒钟,返回 10001。
insert overwrite table t1 select default.timesleep(10000);
  • 在会话2,执行以下命令,会话2 的命令在会话1之前执行完毕。
insert overwrite table t1 select 1;

会话1抛出以下异常

ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:126.
Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:126.
Reason: Aborting [txnid:126,128] due to a write conflict on test/t1 committed by [txnid:127,127] u/u
at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)

由于函数在编译阶段给出具体值。

select * from t1;
+--------+
| t1.c1
|
+--------+
| 1
|
| 2
|
| 3
|
+--------+

以下两个语句:
会话1:

insert overwrite table t2 select c1,default.timesleep(10000 * c1) from t1;

会话2:

insert overwrite table t2 select c1,default.timesleep(10 * c1) from t1;

会话2 会在 commit 的时候失败。

ERROR : FAILED: Hive Internal Error: org.apache.hadoop.hive.ql.lockmgr.LockException(Transaction manager has aborted the transaction txnid:163.
Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u)
org.apache.hadoop.hive.ql.lockmgr.LockException: Transaction manager has aborted the transaction txnid:163.
Reason: Aborting [txnid:163,163] due to a write conflict on test/t2 committed by [txnid:162,163] u/u
at org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.commitTxn(DbTxnManager.java:670)
  1. 在执行 insert overwrite 时,在另外的会话执行drop table,导致第 1 个会话抛出表找不到异常。示例如下:
0: jdbc:hive2://localhost:10000/default> insert overwrite table t1 select default.timesleep(10000);
Error: Error while compiling statement: FAILED: SemanticException [Error 10001]: Line 1:23 Table not found 't1' (state=42S02,code=10001)
  • show transactions 可以显示 aborted 的 transaction。
show transactions;
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
|
txnid
|
state
|
startedtime
|
lastheartbeattime
|
user
|
host
|
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
| Transaction ID
| Transaction State
| Started Time
| Last Heartbeat Time
| User
| Hostname
|
| 126
| ABORTED
| 1646189665000
| 1646189665000
| houzhizhen
| localhost.localdomain
|
| 130
| OPEN
| 1646189960000
| 1646189973000
| houzhizhen
| localhost.localdomain
|
+-----------------+--------------------+----------------+----------------------+-------------+------------------------+
  1. 两个会话同时 insert 不会有冲突。

参考资料:
官方文档:Hive+Transactions
PPT Transactional Operations in Apache Hive:Present and Future

最后

以上就是心灵美小甜瓜为你收集整理的Hive Transaction(Hive 事务管理)常见问题的全部内容,希望文章能够帮你解决Hive Transaction(Hive 事务管理)常见问题所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(61)

评论列表共有 0 条评论

立即
投稿
返回
顶部