我是靠谱客的博主 高高小丸子,最近开发中收集的这篇文章主要介绍spark3访问低版本hive填坑记一、背景二、代码三、异常信息四、原因分析五、问题处理,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

目录

 

一、背景

二、代码

三、异常信息

四、原因分析

五、问题处理

1、开发环境调试处理

2、线上环境部署


一、背景

现有的大数据平台版本

组件版本
hadoop2.7.3
hive1.2.1

因业务需要引入spark组件,引入的为最新版本spark3.0.1。

业务上做测试,将Dataframe数据写入hive中。

二、代码

scala代码

package com.shenyun.scala.exchange
import org.apache.spark.sql.types.{StringType, StructField, StructType}
import org.apache.spark.sql.{Row, SparkSession}
object DfToHive {
def main(args: Array[String]){
System.setProperty("HADOOP_USER_NAME","root");
//初始化参数
val builder = SparkSession.builder()
//本地模式需要设置该行
//也可以VM OPTION设置 :-Dspark.master=local
if(args!=null&&args.size>0&&args(0).equals("dev")){
builder.master("local[2]")
}
builder.appName(this.getClass.getName)
//启用hive
builder.enableHiveSupport()
//设置工作目录,正式环境可以不用设置
//
builder.config("spark.sql.warehouse.dir", "D:\tmp\spark")
val spark = builder.getOrCreate();
import spark.implicits._
//
val studentRDD
//
studentRDD=spark.sparkContext.textFile("file:///D:\workplace\scala\scala-spark\src\main\resources\exchange\student.data")
val studentRDD = spark.sparkContext.textFile("/user/luoyong/data/student.data")
val head = studentRDD.first()
val studentRDDData = studentRDD
.filter(row => row!=head)
.map(x => x.split("\|"))
.map(x => Row(x(0),x(1),x(2),x(3)))
val schema=StructType(Array(
StructField ("id", StringType, nullable = true)
,StructField ("name", StringType, nullable = true)
,StructField ("phone", StringType, nullable = true)
,StructField ("email", StringType, nullable = true))
)
val studentDf=spark.createDataFrame(studentRDDData,schema)
studentDf.createOrReplaceTempView("student_tmp");
spark.sql("insert into luoyong.student_ly select id,name,phone,email from student_tmp");
//关闭
spark.stop()
}
}

pom.xml,

注意这种引用方式访问高版本的hive应该是没有问题的(为测试,理论应该是的)


<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>

三、异常信息

WARN metadata.Hive: Failed to register all functions.
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'

20/12/17 11:19:56 INFO hive.metastore: Connected to metastore.
20/12/17 11:19:56 WARN metadata.Hive: Failed to register all functions.
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3897)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:260)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:286)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:389)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:225)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:103)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:225)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:127)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:157)
at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:155)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:93)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:93)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupGlobalTempView(SessionCatalog.scala:789)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$.lookupTempView(Analyzer.scala:858)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.applyOrElse(Analyzer.scala:840)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$$anonfun$apply$7.applyOrElse(Analyzer.scala:836)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$5(AnalysisHelper.scala:94)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUp$1(AnalysisHelper.scala:94)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:86)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:29)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTempViews$.apply(Analyzer.scala:836)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:962)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:934)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:149)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:146)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:138)
at scala.collection.immutable.List.foreach(List.scala:392)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:138)
at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:176)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:170)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:130)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:116)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:116)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:154)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:153)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:68)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:133)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:133)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:58)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:607)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:602)
at com.shenyun.scala.exchange.DfToHive$.main(DfToHive.scala:43)
at com.shenyun.scala.exchange.DfToHiveTest$.main(DfToHiveTest.scala:9)
at com.shenyun.scala.exchange.DfToHiveTest.main(DfToHiveTest.scala)
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'get_all_functions'
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_all_functions(ThriftHiveMetastore.java:3845)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_all_functions(ThriftHiveMetastore.java:3833)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllFunctions(HiveMetaStoreClient.java:2399)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy33.getAllFunctions(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
at com.sun.proxy.$Proxy33.getAllFunctions(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
... 71 more
20/12/17 11:19:56 WARN client.HiveClientImpl: HiveClient got thrift exception, destroying client and retrying (23 tries remaining)

四、原因分析

主要原因spark版本与hive版本的兼容问题。

spark3.0.1的默认编译的hive版本依赖的包为hive2.3.7

而集群中使用的hive版本为1.2.1

五、问题处理

1、开发环境调试处理

修改pom依赖低版本的hive jar

最终的pom依赖如下,注意scope是provided或者是test。是因为线上集群中在spark3的jars目录有这些包,或者放置。

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
<scope>provided</scope>
</dependency>
<!--
去掉依赖两个jar的依赖-->
<!--
<dependency>-->
<!--
<groupId>org.apache.spark</groupId>-->
<!--
<artifactId>spark-hive_2.12</artifactId>-->
<!--
<version>3.0.1</version>-->
<!--
<scope>provided</scope>-->
<!--
</dependency>-->
<!--
<dependency>-->
<!--
<groupId>org.apache.spark</groupId>-->
<!--
<artifactId>spark-hive-thriftserver_2.12</artifactId>-->
<!--
<version>3.0.1</version>-->
<!--
<scope>provided</scope>-->
<!--
</dependency>-->
<!--
spark-hive_2.12,及spark-hive-thriftserver_2.12重新依赖,并排除对hive的传递依赖-->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.0.1</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-shims</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_2.12</artifactId>
<version>3.0.1</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-cli</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-beeline</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--
hive版本降级为1.2.1,spark3原本依赖2.3.7-->
<!--
spark-hive-thriftserver_2.12的传递依赖-->
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-cli</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-beeline</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<!--
spark-hive_2.12的传递依赖-->
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-common</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--
hive-exec中传递依赖的commons-lang3需要升级回3.9版本-->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.9</version>
<scope>test</scope>
</dependency>

2、线上环境部署

${SPARK3_HOME}/jars目录包替换。

将处理hive的包替换成一下版本的

包名
groupId
artifactId
version
hive-cli.1.2.1.spark2.jar
org.spark-project.hive
hive-cli1.2.1.spark2
hive-jdbc.1.2.1.spark2.jarhive-jdbc
hive-beeline.1.2.1.spark2.jarhive-beeline
hive-common.1.2.1.spark2.jarhive-common
hive-metastore.1.2.1.spark2.jarhive-metastore
hive-exec.1.2.1.spark2.jarhive-exec
    

 

最后

以上就是高高小丸子为你收集整理的spark3访问低版本hive填坑记一、背景二、代码三、异常信息四、原因分析五、问题处理的全部内容,希望文章能够帮你解决spark3访问低版本hive填坑记一、背景二、代码三、异常信息四、原因分析五、问题处理所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(64)

评论列表共有 0 条评论

立即
投稿
返回
顶部