spark-sql 与hive结果不一致

77 阅读 0 评论 51 点赞

我是靠谱客的博主过时冬天，最近开发中收集的这篇文章主要介绍spark-sql 与hive结果不一致，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

同一条sql,hive能生成表,而spark却生成的一张空表,或者数据缺少,存在null值,与hive结果不一致
设置

spark.sql.hive.convertMetastoreOrc=false
convertMetastoreParquet=false

即可
原因:
spark用自己的格式读取hive文件后进行自动转换后进行操作

官方说明

spark.sql.hive.convertMetastoreParquet ： When reading from and writing
to Hive metastore Parquet tables, Spark SQL will try to use its own
Parquet support instead of Hive SerDe for better performance. This
behavior is controlled by the spark.sql.hive.convertMetastoreParquet
configuration, and is turned on by default.

spark.sql.hive.convertMetastoreOrc: enables new ORC format to
read/write Hive Tables.