概述
一、Spark to Hive
1、打开虚拟机,将Hive的配置文件(conf下的hive-site.xml)拷贝到Spark的配置文件目录下(conf),软拷贝硬拷贝皆可以
ln -s /opt/software/hadoop/hive110/conf/hive-site.xml /opt/software/hadoop/spark/conf/hive-site.xml
2、拷贝jar包(mysql-connector-java-5.1.32.jar 拷贝到Spark的jars目录下
cp /opt/software/hadoop/hive110/lib/mysql-connector-java-5.1.32.jar /opt/software/hadoop/spark/jars/
3、启动Spark-shell
spark-shell --jars /opt/software/hadoop/spark/jars/mysql-connector-java-5.1.32.jar
4、在Hive中随便建一张表
5、在Spark SQL中插入数据,此处直接查询数据库做演示
scala> spark.sql("show databases").show()
6、在Hive中查询数据即可看到在Spark中的操作,说明已接通
7、IDEA中集成
Maven搜索Spark-Hive,选第一个Spark Project Hive » [2.4.4],找到对应的scala版本号,导入对应的依赖
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.4</version>
</dependency>
<!-- mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.31</version>
</dependency>
8、把hive110/conf/hive-site.xml文件拷贝到自建的resources资源包中,将日志输出文件导入(可不导)
把第一个property中的hive仓库路径添加hdfs端口hdfs://192.168.29.130:9000
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://192.168.29.130:9000/opt/software/hadoop/hive110/warehouse</value>
</property>
9、mysql中创建Hive账号并赋予权限
mysql中输入以下命令:
grant all on *.* to 'root'@'%' identified by 'kb10';
grant all on *.* to 'root'@'localhost' identified by 'kb10';
flush privileges;
10、IDEA代码如下,即可连接成功
object SparkToHive{
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master(“local[4]”)
.appName(this.getClass.getSimpleName)
.enableHiveSupport()
.getOrCreate()
spark.sql(“show databases”).show()
}
}
做完以上步骤后,在回到虚拟机下使用beeline -u jdbc:hive2://192.168.29.130:10000命令时,启动的是spark内置的beeline,因此无法启动,此时需要进入hive/bin目录下用bash启动即可
二、Spark to MySQLl
1、导入依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.32</version>
</dependency>
2、创建类
object SparkToSql{
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.master("local[4]")
.appName(this.getClass.getSimpleName)
.getOrCreate()
//最后面是数据库名
val url = "jdbc:mysql://192.168.29.130:3306/exam"
val tableName = "cron_test"//表名
// 设置连接用户、密码、数据库驱动类
val prop = new java.util.Properties
prop.setProperty("user","root")
prop.setProperty("password","kb10")
prop.setProperty("driver","com.mysql.jdbc.Driver")
// 取得该表数据
val jdbcDF = spark.read.jdbc(url,tableName,prop)
jdbcDF.show
//DF存为新的表
jdbcDF.write.mode("append").jdbc(url,"t2",prop)
}
}
三、Spark to Hbase
1、导入依赖
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>${hbase.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>${hbase.version}</version>
</dependency>
2、创建类
读操作
object SparkToHbase{
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName(this.getClass.getName)
.master("local[*]")
.getOrCreate()
val sc= spark.sparkContext
//" NAMESPACE : TABLE_NAME "
val tablename=" NAMESPACE : TABLE_NAME "
val conf = HBaseConfiguration.create()
//" IP_ADDRESS " 处,输入对应的虚机的IP地址
conf.set("hbase.zookeeper.quorum"," IP_ADDRESS ")
conf.set("hbase.zookeeper.property.clientPort","2181")
conf.set(TableInputFormat.INPUT_TABLE,tablename)
val rdd1= sc.newAPIHadoopRDD(conf,classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
classOf[org.apache.hadoop.hbase.client.Result]
).cache()
println("count="+rdd1.count())
import spark.implicits._
//遍历输出
rdd1.foreach({case (_,result) =>
//通过result.getRow来获取行键
val key = Bytes.toString(result.getRow)
//通过result.getValue("列簇","列名")来获取值
//需要使用getBytes将字符流转化为字节流
val buynum = Bytes.toString(result.getValue("列簇".getBytes,"列名".getBytes))
val cust_id = Bytes.toString(result.getValue("列簇".getBytes,"列名".getBytes))
val dt = Bytes.toString(result.getValue("列簇".getBytes,"列名".getBytes))
val good_id = Bytes.toString(result.getValue("列簇".getBytes,"列名".getBytes))
//举个栗子~~
println("Row key:"+key+" buynum:"+buynum+" cust_id:"+cust_id+" dt:"+dt+" good_id:"+good_id)
})
}
写操作
object SparkToHBase {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder()
.appName(this.getClass.getName)
.master("local[*]")
.getOrCreate()
val sc= spark.sparkContext
//" NAMESPACE : TABLE_NAME "
val tablename=" NAMESPACE : TABLE_NAME "
val conf = HBaseConfiguration.create()
//" IP_ADDRESS " 处,输入对应的虚机的IP地址
conf.set("hbase.zookeeper.quorum"," IP_ADDRESS ")
conf.set("hbase.zookeeper.property.clientPort","2181")
conf.set(TableInputFormat.OUTPUT_TABLE,tablename)
val job = new JobConf(conf)
job.setOutputFormat(classOf[TableOutputFormat])
val indataRDD = sc.makeRDD(Array("11,1,6,20200807,7"))
val rdd = indataRDD.map(_.split(",")).map{arr=>
val put = new Put(Bytes.toBytes(arr(0)))
//通过addColumn("列簇","列名")来将数据写入
put.addColumn(Bytes.toBytes("one"),Bytes.toBytes("buynum"),Bytes.toBytes(arr(1)))
put.addColumn(Bytes.toBytes("one"),Bytes.toBytes("cust_id"),Bytes.toBytes(arr(2)))
put.addColumn(Bytes.toBytes("one"),Bytes.toBytes("dt"),Bytes.toBytes(arr(3)))
put.addColumn(Bytes.toBytes("one"),Bytes.toBytes("good_id"),Bytes.toBytes(arr(4)))
(new ImmutableBytesWritable,put)
}
rdd.saveAsHadoopDataset(job)
}
最后
以上就是务实背包为你收集整理的spark-toHive、tomysql、tohbase的全部内容,希望文章能够帮你解决spark-toHive、tomysql、tohbase所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复