spark SQL学习（spark连接 mysql）spark连接mysql（打jar包方式）提交集群运行结果常见报错1常见报错2spark连接mysql（spark shell方式）

153 阅读 0 评论 101 点赞

我是靠谱客的博主忐忑毛豆，这篇文章主要介绍spark SQL学习（spark连接 mysql）spark连接mysql（打jar包方式）提交集群运行结果常见报错1常见报错2spark连接mysql（spark shell方式），现在分享给大家，希望可以做个参考。

spark连接mysql（打jar包方式）

package wujiadong_sparkSQL
import java.util.Properties
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by Administrator on 2017/2/14.
*/
object JdbcOperation {
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("JdbcOperation")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val properties = new Properties()
properties.put("user","feigu")
properties.put("password","feigu")
val url = "jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull"
val stud_scoreDF = sqlContext.read.jdbc(url,"stud_score",properties)
stud_scoreDF.show()
}
}

提交集群

hadoop@master:~/wujiadong$ spark-submit --driver-class-path /home/hadoop/bigdata/hive/lib/mysql-connector-java-5.1.10-2.jar
--class wujiadong_sparkSQL.JdbcOperation
--executor-memory 500m --total-executor-cores 2 /home/hadoop/wujiadong/wujiadong.spark.jar
或者
hadoop@master:~/wujiadong$ spark-submit --jars /home/hadoop/bigdata/hive/lib/mysql-connector-java-5.1.10-2.jar
--class wujiadong_sparkSQL.JdbcOperation
--executor-memory 500m --total-executor-cores 2 /home/hadoop/wujiadong/wujiadong.spark.jar

运行结果

hadoop@master:~/wujiadong$ spark-submit --driver-class-path /home/hadoop/bigdata/hive/lib/mysql-connector-java-5.1.10-2.jar
--class wujiadong_sparkSQL.JdbcOperation
--executor-memory 500m --total-executor-cores 2 /home/hadoop/wujiadong/wujiadong.spark.jar
17/02/15 13:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/15 13:21:09 INFO Slf4jLogger: Slf4jLogger started
17/02/15 13:21:09 INFO Remoting: Starting remoting
17/02/15 13:21:09 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.131:40654]
17/02/15 13:21:13 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
+----------+--------+--------+--------+---------+---------+
| stud_code|sub_code|sub_name|sub_tech|sub_score|stat_date|
+----------+--------+--------+--------+---------+---------+
|2015101000|
10101|
数学分析|
|
90|
null|
|2015101000|
10102|
高等代数|
|
88|
null|
|2015101000|
10103|
大学物理|
|
67|
null|
|2015101000|
10104|
计算机原理|
|
78|
null|
|2015101000|
10105|
电磁学|
|
89|
null|
|2015101001|
10101|
数学分析|
|
87|
null|
|2015101001|
10102|
高等代数|
|
78|
null|
|2015101001|
10103|
大学物理|
|
88|
null|
|2015101001|
10104|
计算机原理|
|
86|
null|
|2015101001|
10105|
电磁学|
|
91|
null|
|2015101002|
10101|
数学分析|
|
98|
null|
|2015101002|
10102|
高等代数|
|
97|
null|
|2015101002|
10103|
大学物理|
|
95|
null|
|2015101002|
10104|
计算机原理|
|
96|
null|
|2015101002|
10105|
电磁学|
|
90|
null|
|2015101003|
10101|
数学分析|
|
70|
null|
|2015101003|
10102|
高等代数|
|
87|
null|
|2015101003|
10103|
大学物理|
|
65|
null|
|2015101003|
10104|
计算机原理|
|
98|
null|
|2015101003|
10105|
电磁学|
|
76|
null|
+----------+--------+--------+--------+---------+---------+
only showing top 20 rows
17/02/15 13:21:24 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/02/15 13:21:24 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.

常见报错1

Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:mysql://slave02:3306/testdb
报错原因是没有jdbc驱动
解决办法
--driver-class-path xxx.jar 或者
--jars xxx.jar

如果添加了命令和jar运行也不行,则用以下办法

在%JAVA_HOME%jrelibext下添加mysql-connector-java-5.1.12-bin.jar 问题解决

常见报错2

java.sql.SQLException: Value '0000-00-00' can not be represented as java.sql.Date
0000-00-00 ”在MySQL中是作为一个特殊值存在的，但是在Java中， java.sql.Date 会被视为 不合法的值，被JVM认为格式不正确。
解决办法：在jdbc的url加上
zeroDateTimeBehavior参数
url = "jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull"

spark连接mysql（spark shell方式）

方式1


//import sqlContext.implicits._
//有时需要用到，需要时导入
scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@6cd1ee
scala> val url ="jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull"
url: String = jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull
scala> val prop = new java.util.Properties
prop: java.util.Properties = {}
scala> prop.setProperty("user","feigu")
res3: Object = null
scala> prop.setProperty("password","feigu")
res4: Object = null
scala> val stud_scoreDF = sqlContext.read.jdbc(url,"stud_score",prop)
stud_scoreDF: org.apache.spark.sql.DataFrame = [stud_code: string, sub_code: string, sub_name: string, sub_tech: string, sub_score: int, stat_date: date]
scala> stud_scoreDF.show()
+----------+--------+--------+--------+---------+---------+
| stud_code|sub_code|sub_name|sub_tech|sub_score|stat_date|
+----------+--------+--------+--------+---------+---------+
|2015101000|
10101|
数学分析|
|
90|
null|
|2015101000|
10102|
高等代数|
|
88|
null|
|2015101000|
10103|
大学物理|
|
67|
null|
|2015101000|
10104|
计算机原理|
|
78|
null|
|2015101000|
10105|
电磁学|
|
89|
null|
|2015101001|
10101|
数学分析|
|
87|
null|
|2015101001|
10102|
高等代数|
|
78|
null|
|2015101001|
10103|
大学物理|
|
88|
null|
|2015101001|
10104|
计算机原理|
|
86|
null|
|2015101001|
10105|
电磁学|
|
91|
null|
|2015101002|
10101|
数学分析|
|
98|
null|
|2015101002|
10102|
高等代数|
|
97|
null|
|2015101002|
10103|
大学物理|
|
95|
null|
|2015101002|
10104|
计算机原理|
|
96|
null|
|2015101002|
10105|
电磁学|
|
90|
null|
|2015101003|
10101|
数学分析|
|
70|
null|
|2015101003|
10102|
高等代数|
|
87|
null|
|2015101003|
10103|
大学物理|
|
65|
null|
|2015101003|
10104|
计算机原理|
|
98|
null|
|2015101003|
10105|
电磁学|
|
76|
null|
+----------+--------+--------+--------+---------+---------+
only showing top 20 rows

方式2

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@351d726c
scala> import sqlContext.implicits._
import sqlContext.implicits._
scala> val url ="jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull"
url: String = jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull
scala> val table = "stud_score"
table: String = stud_score
scala> val reader = sqlContext.read.format("jdbc")
reader: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> val reader = sqlContext.read.format("jdbc")
reader: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> reader.option("url",url)
res0: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> reader.option("dbtable",table)
res4: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> reader.option("driver","com.mysql.jdbc.Driver")
res6: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> reader.option("user","feigu")
res7: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> reader.option("password","feigu")
res8: org.apache.spark.sql.DataFrameReader = org.apache.spark.sql.DataFrameReader@49c37918
scala> val DF = reader.load()
DF: org.apache.spark.sql.DataFrame = [stud_code: string, sub_code: string, sub_name: string, sub_tech: string, sub_score: int, stat_date: date]
scala> DF.show()
+----------+--------+--------+--------+---------+---------+
| stud_code|sub_code|sub_name|sub_tech|sub_score|stat_date|
+----------+--------+--------+--------+---------+---------+
|2015101000|
10101|
数学分析|
|
90|
null|
|2015101000|
10102|
高等代数|
|
88|
null|
|2015101000|
10103|
大学物理|
|
67|
null|
|2015101000|
10104|
计算机原理|
|
78|
null|
|2015101000|
10105|
电磁学|
|
89|
null|
|2015101001|
10101|
数学分析|
|
87|
null|
|2015101001|
10102|
高等代数|
|
78|
null|
|2015101001|
10103|
大学物理|
|
88|
null|
|2015101001|
10104|
计算机原理|
|
86|
null|
|2015101001|
10105|
电磁学|
|
91|
null|
|2015101002|
10101|
数学分析|
|
98|
null|
|2015101002|
10102|
高等代数|
|
97|
null|
|2015101002|
10103|
大学物理|
|
95|
null|
|2015101002|
10104|
计算机原理|
|
96|
null|
|2015101002|
10105|
电磁学|
|
90|
null|
|2015101003|
10101|
数学分析|
|
70|
null|
|2015101003|
10102|
高等代数|
|
87|
null|
|2015101003|
10103|
大学物理|
|
65|
null|
|2015101003|
10104|
计算机原理|
|
98|
null|
|2015101003|
10105|
电磁学|
|
76|
null|
+----------+--------+--------+--------+---------+---------+
only showing top 20 rows

方式3

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext
scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@fdf029a
scala> val jdbcDF = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:mysql://slave02:3306/testdb?useUnicode=true&characterEncoding=gbk&zeroDateTimeBehavior=convertToNull","driver" -> "com.mysql.jdbc.Driver", "dbtable" -> "testdb.stud_score","user" -> "feigu","password" -> "feigu")).load()
jdbcDF: org.apache.spark.sql.DataFrame = [stud_code: string, sub_code: string, sub_name: string, sub_tech: string, sub_score: int, stat_date: date]
scala> jdbcDF.show()
+----------+--------+--------+--------+---------+---------+
| stud_code|sub_code|sub_name|sub_tech|sub_score|stat_date|
+----------+--------+--------+--------+---------+---------+
|2015101000|
10101|
数学分析|
|
90|
null|
|2015101000|
10102|
高等代数|
|
88|
null|
|2015101000|
10103|
大学物理|
|
67|
null|
|2015101000|
10104|
计算机原理|
|
78|
null|
|2015101000|
10105|
电磁学|
|
89|
null|
|2015101001|
10101|
数学分析|
|
87|
null|
|2015101001|
10102|
高等代数|
|
78|
null|
|2015101001|
10103|
大学物理|
|
88|
null|
|2015101001|
10104|
计算机原理|
|
86|
null|
|2015101001|
10105|
电磁学|
|
91|
null|
|2015101002|
10101|
数学分析|
|
98|
null|
|2015101002|
10102|
高等代数|
|
97|
null|
|2015101002|
10103|
大学物理|
|
95|
null|
|2015101002|
10104|
计算机原理|
|
96|
null|
|2015101002|
10105|
电磁学|
|
90|
null|
|2015101003|
10101|
数学分析|
|
70|
null|
|2015101003|
10102|
高等代数|
|
87|
null|
|2015101003|
10103|
大学物理|
|
65|
null|
|2015101003|
10104|
计算机原理|
|
98|
null|
|2015101003|
10105|
电磁学|
|
76|
null|
+----------+--------+--------+--------+---------+---------+
only showing top 20 rows
//注册为一个表。这就可以直接进行select等操作样
scala> jdbcDF.registerTempTable("wu_stud_info")
scala> jdbcDF.sqlContext.sql("select sub_name from wu_stud_info").collect.foreach(println)
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[数学分析]
[高等代数]
[大学物理]
[计算机原理]
[电磁学]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]
[计算机软件与理论]
[计算机系统结构]
[操作系统]
[概率统计]
[汇编语言]
[数据结构]

转载于:https://www.cnblogs.com/wujiadong2014/p/6516598.html