我是靠谱客的博主 忧伤鸡,最近开发中收集的这篇文章主要介绍数据湖之Hudi(11):使用Spark更新Hudi中的数据0. 相关文章链接1. 环境准备2. Maven依赖3. 核心代码,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

目录

0. 相关文章链接

1. 环境准备

1.1. 构建服务器环境

1.2. 构建Maven项目和写入数据

2. Maven依赖

3. 核心代码


0. 相关文章链接

数据湖 文章汇总

1. 环境准备

1.1. 构建服务器环境

关于构建Spark向Hudi中插入数据的服务器环境,可以参考博文的另外一篇博文,在CentOS7上安装HDFS即可,博文连接:数据湖之Hudi(6):Hudi与Spark和HDFS的集成安装使用

1.2. 构建Maven项目和写入数据

此博文演示的是使用Spark代码查询Hudi中已有表中的数据,需要先构建一个Maven项目,并向Hudi中插入一些模拟数据,这些可以参考博主的另外一篇博文进行操作,博文连接:数据湖之Hudi(9):使用Spark向Hudi中插入数据

2. Maven依赖

在另一篇博文中有Maven依赖,但在这里还是补充一下

    <repositories>
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        </repository>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
        <repository>
            <id>jboss</id>
            <url>http://repository.jboss.com/nexus/content/groups/public</url>
        </repository>
    </repositories>

    <properties>
        <scala.version>2.12.10</scala.version>
        <scala.binary.version>2.12</scala.binary.version>
        <spark.version>3.0.0</spark.version>
        <hadoop.version>3.0.0</hadoop.version>
        <hudi.version>0.9.0</hudi.version>
    </properties>

    <dependencies>

        <!-- 依赖Scala语言 -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
        </dependency>

        <!-- Spark Core 依赖 -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <!-- Spark SQL 依赖 -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

        <!-- Hadoop Client 依赖 -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

        <!-- hudi-spark3 -->
        <dependency>
            <groupId>org.apache.hudi</groupId>
            <artifactId>hudi-spark3-bundle_2.12</artifactId>
            <version>${hudi.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-avro_2.12</artifactId>
            <version>${spark.version}</version>
        </dependency>

    </dependencies>

    <build>
        <outputDirectory>target/classes</outputDirectory>
        <testOutputDirectory>target/test-classes</testOutputDirectory>
        <resources>
            <resource>
                <directory>${project.basedir}/src/main/resources</directory>
            </resource>
        </resources>
        <!-- Maven 编译的插件 -->
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.0</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

3. 核心代码

第1步、模拟产生插入数据
第2步、将插入数据写入到Hudi中
第3步、模拟产生更新数据
第4步、将更新数据使用Append模式更新到Hudi中

package com.ouyang.hudi.crud

import scala.collection.JavaConverters._
import org.apache.hudi.QuickstartUtils._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}

/**
 * @ date: 2022/2/23
 * @ author: yangshibiao
 * @ desc: 更新(Update)数据
 * 第1步、模拟产生插入数据
 * 第2步、将插入数据写入到Hudi中
 * 第3步、模拟产生更新数据
 * 第4步、将更新数据使用Append模式更新到Hudi中
 */
object Demo03_Update {

    def main(args: Array[String]): Unit = {

        System.setProperty("HADOOP_USER_NAME", "root")

        // 创建SparkSession实例对象,设置属性
        val spark: SparkSession = {
            SparkSession.builder()
                .appName(this.getClass.getSimpleName.stripSuffix("$"))
                .master("local[4]")
                // 设置序列化方式:Kryo
                .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                .getOrCreate()
        }

        // 定义变量:表名称、保存路径
        val tableName: String = "tbl_trips_cow"
        val tablePath: String = "/hudi-warehouse/tbl_trips_cow"

        // 导入隐式转换和相关方法
        import spark.implicits._


        // 创建模拟器,因为要更新数据,所以用同一个模拟器
        val dataGen: DataGenerator = new DataGenerator()

        // 第1步、模拟乘车数据,并将数据转换成DF
        val inserts = convertToStringList(dataGen.generateInserts(20))
        val insertDF: DataFrame = spark.read.json(
            spark.sparkContext.parallelize(inserts.asScala, 2).toDS()
        )

        // 第2步、插入数据到Hudi表,选择Overwrite模式,这样不管这个目录和表是否有无数据都会重写
        println("插入数据中:" + System.currentTimeMillis())
        insertDF.write
            .mode(SaveMode.Overwrite)
            .format("hudi")
            .option("hoodie.insert.shuffle.parallelism", "2")
            .option("hoodie.upsert.shuffle.parallelism", "2")
            // Hudi 表的属性值设置
            .option(PRECOMBINE_FIELD.key(), "ts")
            .option(RECORDKEY_FIELD.key(), "uuid")
            .option(PARTITIONPATH_FIELD.key(), "partitionpath")
            .option(TBL_NAME.key(), tableName)
            .save(tablePath)

        // 获取更新前Hudi中的数据,并打印
        println("获取更新前Hudi中的数据中:" + System.currentTimeMillis())
        val updateBeforeDF: DataFrame = spark.read.format("hudi").load(tablePath)
        updateBeforeDF.printSchema()
        updateBeforeDF.show(100, truncate = false)

        println("==================== 分割线 ====================")

        // 第3步、用同一个模拟器生成更新数据,并也将数据转换成DF
        val updates = convertToStringList(dataGen.generateUpdates(20))
        val updateDF: DataFrame = spark.read.json(
            spark.sparkContext.parallelize(updates.asScala, 2).toDS()
        )

        // 第4步、将更新数据插入数据到Hudi表中,使用Append模式才会更新
        println("更新数据中:" + System.currentTimeMillis())
        updateDF.write
            .mode(SaveMode.Append)
            .format("hudi")
            .option("hoodie.insert.shuffle.parallelism", "2")
            .option("hoodie.upsert.shuffle.parallelism", "2")
            // Hudi 表的属性值设置
            .option(PRECOMBINE_FIELD.key(), "ts")
            .option(RECORDKEY_FIELD.key(), "uuid")
            .option(PARTITIONPATH_FIELD.key(), "partitionpath")
            .option(TBL_NAME.key(), tableName)
            .save(tablePath)

        // 获取更新后Hudi中的数据,并打印
        println("获取更新后Hudi中的数据中:" + System.currentTimeMillis())
        val updateAfterDF: DataFrame = spark.read.format("hudi").load(tablePath)
        updateAfterDF.printSchema()
        updateAfterDF.show(100, truncate = false)
    }

}

第一次插入数据后在代码中会对数据进行读取打印,再对数据进行更新后,再次对数据进行读取打印了,如下所示,可以看出确实对数据进行了更新:

插入数据中:1645638938444
获取更新前Hudi中的数据中:1645638946581
root
 |-- _hoodie_commit_time: string (nullable = true)
 |-- _hoodie_commit_seqno: string (nullable = true)
 |-- _hoodie_record_key: string (nullable = true)
 |-- _hoodie_partition_path: string (nullable = true)
 |-- _hoodie_file_name: string (nullable = true)
 |-- begin_lat: double (nullable = true)
 |-- begin_lon: double (nullable = true)
 |-- driver: string (nullable = true)
 |-- end_lat: double (nullable = true)
 |-- end_lon: double (nullable = true)
 |-- fare: double (nullable = true)
 |-- rider: string (nullable = true)
 |-- ts: long (nullable = true)
 |-- uuid: string (nullable = true)
 |-- partitionpath: string (nullable = true)

+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+--------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key                  |_hoodie_partition_path              |_hoodie_file_name                                                    |begin_lat           |begin_lon          |driver    |end_lat            |end_lon             |fare              |rider    |ts           |uuid                                |partitionpath                       |
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+--------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|20220224015538     |20220224015538_1_7  |c4c672c4-bc22-4954-94ec-8ad80aa3664a|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.5731835407930634  |0.4923479652912024 |driver-213|0.08988581780930216|0.42520899698713666 |64.27696295884016 |rider-213|1645569108280|c4c672c4-bc22-4954-94ec-8ad80aa3664a|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_8  |0299897c-a852-4129-ae52-0dfc3d76b5c2|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.023755167724156978|0.6322099740212305 |driver-213|0.2171902015800108 |0.2132173852420407  |15.330847537835645|rider-213|1645125971122|0299897c-a852-4129-ae52-0dfc3d76b5c2|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_9  |6b3794f7-b26d-4c6e-8ea6-2bd8ed6992df|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.8742041526408587  |0.7528268153249502 |driver-213|0.9197827128888302 |0.362464770874404   |19.179139106643607|rider-213|1645603972111|6b3794f7-b26d-4c6e-8ea6-2bd8ed6992df|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_10 |902ecdf8-640e-4847-834b-7e483f5adcf4|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.8675932789048282  |0.9563153782052657 |driver-213|0.8534087075068594 |0.4153669760172203  |64.12151064878266 |rider-213|1645222316086|902ecdf8-640e-4847-834b-7e483f5adcf4|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_11 |599d5efd-9a84-4232-b871-225258cb8520|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.11488393157088261 |0.6273212202489661 |driver-213|0.7454678537511295 |0.3954939864908973  |27.79478688582596 |rider-213|1645100687224|599d5efd-9a84-4232-b871-225258cb8520|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_12 |6ef20b48-3403-496e-81c5-6964f0c170bd|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.21624150367601136 |0.14285051259466197|driver-213|0.5890949624813784 |0.0966823831927115  |93.56018115236618 |rider-213|1645207425020|6ef20b48-3403-496e-81c5-6964f0c170bd|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_13 |873407d3-8824-49b4-98aa-a597a0240d45|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.2947661370147079  |0.8039197581711358 |driver-213|0.8248244842522374 |0.3873920783955822  |84.9600214569341  |rider-213|1645435050727|873407d3-8824-49b4-98aa-a597a0240d45|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_14 |cb5d57f3-e44e-42b7-b35f-7de3047acfb0|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.1856488085068272  |0.9694586417848392 |driver-213|0.38186367037201974|0.25252652214479043 |33.92216483948643 |rider-213|1645473662083|cb5d57f3-e44e-42b7-b35f-7de3047acfb0|americas/united_states/san_francisco|
|20220224015538     |20220224015538_0_1  |43cb0114-c7a2-4b56-aecf-d73a49c0345e|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.0750588760043035  |0.03844104444445928|driver-213|0.04376353354538354|0.6346040067610669  |66.62084366450246 |rider-213|1645059243518|43cb0114-c7a2-4b56-aecf-d73a49c0345e|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_2  |cc224f56-5b5c-4d85-b56a-87c74c1a7b2e|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.983428192817987   |0.3961523475372767 |driver-213|0.20548299593469077|0.9836743920572577  |60.047501243947934|rider-213|1645597679831|cc224f56-5b5c-4d85-b56a-87c74c1a7b2e|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_3  |97391e4d-350d-4d67-93c7-9e1a2ac60fc0|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.6372504913279929  |0.04241635032425073|driver-213|0.36284275950041867|0.6591829686989255  |44.839244944180244|rider-213|1645629063846|97391e4d-350d-4d67-93c7-9e1a2ac60fc0|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_4  |2ec62676-7a1c-4f02-8b11-791f7847eabd|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.4726905879569653  |0.46157858450465483|driver-213|0.754803407008858  |0.9671159942018241  |34.158284716382845|rider-213|1645203465242|2ec62676-7a1c-4f02-8b11-791f7847eabd|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_5  |3552981d-38c3-4a2f-8f89-7d2d3be1d341|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.9025710109008239  |0.2693250504574297 |driver-213|0.6357677757664507 |0.25770004462445395 |87.08158608552242 |rider-213|1645597722203|3552981d-38c3-4a2f-8f89-7d2d3be1d341|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_6  |7eb0cf33-b411-40a1-9066-c0a67738b4af|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.6100070562136587  |0.8779402295427752 |driver-213|0.3407870505929602 |0.5030798142293655  |43.4923811219014  |rider-213|1645155397506|7eb0cf33-b411-40a1-9066-c0a67738b4af|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_2_15 |9fb42eaa-8b5c-4b70-bc90-ff3e298b659b|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.09384124531808036 |0.9623582692596406 |driver-213|0.44485904691083133|0.5550300795070142  |53.69977335639399 |rider-213|1645255841245|9fb42eaa-8b5c-4b70-bc90-ff3e298b659b|asia/india/chennai                  |
|20220224015538     |20220224015538_2_16 |c68d347f-5e21-4e4b-8f8c-382c57100f3f|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.8679173655153939  |0.17992665967365185|driver-213|0.7721097247931136 |0.9662606385568611  |70.59591659793207 |rider-213|1645253155280|c68d347f-5e21-4e4b-8f8c-382c57100f3f|asia/india/chennai                  |
|20220224015538     |20220224015538_2_17 |7c6d6dd1-fe81-481b-8e74-6284cff7f3d2|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.40613510977307    |0.5644092139040959 |driver-213|0.798706304941517  |0.02698359227182834 |17.851135255091155|rider-213|1645469686991|7c6d6dd1-fe81-481b-8e74-6284cff7f3d2|asia/india/chennai                  |
|20220224015538     |20220224015538_2_18 |61e28fd7-f0ac-4637-a64e-5285ab83538f|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.49527694252432053 |0.28072552620450797|driver-213|0.44848221556652057|0.565791994047955   |93.00604432281203 |rider-213|1645481507252|61e28fd7-f0ac-4637-a64e-5285ab83538f|asia/india/chennai                  |
|20220224015538     |20220224015538_2_19 |5e432421-b019-4354-9929-61895cdaa213|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.9090538095331541  |0.8801105093619153 |driver-213|0.5873040159790485 |0.028263672792464445|40.211140833035394|rider-213|1645638754639|5e432421-b019-4354-9929-61895cdaa213|asia/india/chennai                  |
|20220224015538     |20220224015538_2_20 |02ca3ce3-4925-480a-8b70-73111e35afff|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.651058505660742   |0.8192868687714224 |driver-213|0.20714896002914462|0.06224031095826987 |41.06290929046368 |rider-213|1645485296066|02ca3ce3-4925-480a-8b70-73111e35afff|asia/india/chennai                  |
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+-------------------+----------+-------------------+--------------------+------------------+---------+-------------+------------------------------------+------------------------------------+

==================== 分割线 ====================
更新数据中:1645638947885
获取更新后Hudi中的数据中:1645638951488
root
 |-- _hoodie_commit_time: string (nullable = true)
 |-- _hoodie_commit_seqno: string (nullable = true)
 |-- _hoodie_record_key: string (nullable = true)
 |-- _hoodie_partition_path: string (nullable = true)
 |-- _hoodie_file_name: string (nullable = true)
 |-- begin_lat: double (nullable = true)
 |-- begin_lon: double (nullable = true)
 |-- driver: string (nullable = true)
 |-- end_lat: double (nullable = true)
 |-- end_lon: double (nullable = true)
 |-- fare: double (nullable = true)
 |-- rider: string (nullable = true)
 |-- ts: long (nullable = true)
 |-- uuid: string (nullable = true)
 |-- partitionpath: string (nullable = true)

+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+--------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key                  |_hoodie_partition_path              |_hoodie_file_name                                                    |begin_lat           |begin_lon           |driver    |end_lat            |end_lon            |fare              |rider    |ts           |uuid                                |partitionpath                       |
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+--------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+
|20220224015547     |20220224015547_1_24 |c4c672c4-bc22-4954-94ec-8ad80aa3664a|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-70-86_20220224015547.parquet|0.47255932910824583 |0.09835174451313866 |driver-192|0.8768271062363665 |0.391583018565109  |82.6183030502974  |rider-192|1645611981331|c4c672c4-bc22-4954-94ec-8ad80aa3664a|americas/united_states/san_francisco|
|20220224015547     |20220224015547_1_25 |0299897c-a852-4129-ae52-0dfc3d76b5c2|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-70-86_20220224015547.parquet|0.7885334532337877  |0.8573824804430561  |driver-192|0.47332186591003045|0.9927159674996295 |50.45582154226707 |rider-192|1645344506822|0299897c-a852-4129-ae52-0dfc3d76b5c2|americas/united_states/san_francisco|
|20220224015547     |20220224015547_1_26 |6b3794f7-b26d-4c6e-8ea6-2bd8ed6992df|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-70-86_20220224015547.parquet|0.584204225520771   |0.7212263680879302  |driver-192|0.5501675314928346 |0.6226833057042072 |60.704347025098535|rider-192|1645461680678|6b3794f7-b26d-4c6e-8ea6-2bd8ed6992df|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_10 |902ecdf8-640e-4847-834b-7e483f5adcf4|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.8675932789048282  |0.9563153782052657  |driver-213|0.8534087075068594 |0.4153669760172203 |64.12151064878266 |rider-213|1645222316086|902ecdf8-640e-4847-834b-7e483f5adcf4|americas/united_states/san_francisco|
|20220224015547     |20220224015547_1_27 |599d5efd-9a84-4232-b871-225258cb8520|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-70-86_20220224015547.parquet|0.04327820937619131 |0.8562530975462316  |driver-192|0.4539370966816483 |0.5535762898838785 |75.48086309564754 |rider-192|1645588556978|599d5efd-9a84-4232-b871-225258cb8520|americas/united_states/san_francisco|
|20220224015547     |20220224015547_1_28 |6ef20b48-3403-496e-81c5-6964f0c170bd|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-70-86_20220224015547.parquet|0.5142305232303094  |0.30495686857778403 |driver-192|0.29666655980198253|0.16768228612130764|24.070894571476064|rider-192|1645452618245|6ef20b48-3403-496e-81c5-6964f0c170bd|americas/united_states/san_francisco|
|20220224015538     |20220224015538_1_13 |873407d3-8824-49b4-98aa-a597a0240d45|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-27-28_20220224015538.parquet|0.2947661370147079  |0.8039197581711358  |driver-213|0.8248244842522374 |0.3873920783955822 |84.9600214569341  |rider-213|1645435050727|873407d3-8824-49b4-98aa-a597a0240d45|americas/united_states/san_francisco|
|20220224015547     |20220224015547_1_29 |cb5d57f3-e44e-42b7-b35f-7de3047acfb0|americas/united_states/san_francisco|9dfc33ef-edc7-463c-8a4a-fd78c6f2372b-0_1-70-86_20220224015547.parquet|0.4878809010360382  |0.07610014905198248 |driver-192|0.9334457064050349 |0.6330100459693088 |90.84944020139248 |rider-192|1645359179566|cb5d57f3-e44e-42b7-b35f-7de3047acfb0|americas/united_states/san_francisco|
|20220224015538     |20220224015538_0_1  |43cb0114-c7a2-4b56-aecf-d73a49c0345e|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.0750588760043035  |0.03844104444445928 |driver-213|0.04376353354538354|0.6346040067610669 |66.62084366450246 |rider-213|1645059243518|43cb0114-c7a2-4b56-aecf-d73a49c0345e|americas/brazil/sao_paulo           |
|20220224015547     |20220224015547_0_21 |cc224f56-5b5c-4d85-b56a-87c74c1a7b2e|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-64-85_20220224015547.parquet|0.4925455806562906  |0.5324426130133701  |driver-192|0.964861920281932  |0.4727110150355711 |72.67793086410465 |rider-192|1645341842778|cc224f56-5b5c-4d85-b56a-87c74c1a7b2e|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_3  |97391e4d-350d-4d67-93c7-9e1a2ac60fc0|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.6372504913279929  |0.04241635032425073 |driver-213|0.36284275950041867|0.6591829686989255 |44.839244944180244|rider-213|1645629063846|97391e4d-350d-4d67-93c7-9e1a2ac60fc0|americas/brazil/sao_paulo           |
|20220224015538     |20220224015538_0_4  |2ec62676-7a1c-4f02-8b11-791f7847eabd|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-21-27_20220224015538.parquet|0.4726905879569653  |0.46157858450465483 |driver-213|0.754803407008858  |0.9671159942018241 |34.158284716382845|rider-213|1645203465242|2ec62676-7a1c-4f02-8b11-791f7847eabd|americas/brazil/sao_paulo           |
|20220224015547     |20220224015547_0_22 |3552981d-38c3-4a2f-8f89-7d2d3be1d341|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-64-85_20220224015547.parquet|0.14503019204958845 |0.5281436198246144  |driver-192|0.3291184473506418 |0.772134626462835  |85.36791718953374 |rider-192|1645104198479|3552981d-38c3-4a2f-8f89-7d2d3be1d341|americas/brazil/sao_paulo           |
|20220224015547     |20220224015547_0_23 |7eb0cf33-b411-40a1-9066-c0a67738b4af|americas/brazil/sao_paulo           |f5a4fc01-eeb7-4129-a898-b892a8ec27ab-0_0-64-85_20220224015547.parquet|0.024995362119815567|0.5120368636375937  |driver-192|0.21729959707372848|0.08151154133724581|19.873758263401708|rider-192|1645597329495|7eb0cf33-b411-40a1-9066-c0a67738b4af|americas/brazil/sao_paulo           |
|20220224015547     |20220224015547_2_30 |9fb42eaa-8b5c-4b70-bc90-ff3e298b659b|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-70-87_20220224015547.parquet|0.6228854864580208  |0.8315496170667523  |driver-192|0.6281051198140281 |0.9312237784651692 |67.243450582925   |rider-192|1645075234625|9fb42eaa-8b5c-4b70-bc90-ff3e298b659b|asia/india/chennai                  |
|20220224015547     |20220224015547_2_31 |c68d347f-5e21-4e4b-8f8c-382c57100f3f|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-70-87_20220224015547.parquet|0.970612666616691   |0.017082935178053815|driver-192|0.11178708874754062|0.1450793330198833 |20.404106962358203|rider-192|1645312560622|c68d347f-5e21-4e4b-8f8c-382c57100f3f|asia/india/chennai                  |
|20220224015538     |20220224015538_2_17 |7c6d6dd1-fe81-481b-8e74-6284cff7f3d2|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.40613510977307    |0.5644092139040959  |driver-213|0.798706304941517  |0.02698359227182834|17.851135255091155|rider-213|1645469686991|7c6d6dd1-fe81-481b-8e74-6284cff7f3d2|asia/india/chennai                  |
|20220224015547     |20220224015547_2_32 |61e28fd7-f0ac-4637-a64e-5285ab83538f|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-70-87_20220224015547.parquet|0.8945841313717807  |0.3945018779685283  |driver-192|0.8920584575412743 |0.9759079698192936 |71.07035158051175 |rider-192|1645336706237|61e28fd7-f0ac-4637-a64e-5285ab83538f|asia/india/chennai                  |
|20220224015547     |20220224015547_2_33 |5e432421-b019-4354-9929-61895cdaa213|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-70-87_20220224015547.parquet|0.26636532270940916 |0.6539904550963876  |driver-192|0.27262593896775367|0.4292589705152693 |69.9025398548803  |rider-192|1645057355007|5e432421-b019-4354-9929-61895cdaa213|asia/india/chennai                  |
|20220224015538     |20220224015538_2_20 |02ca3ce3-4925-480a-8b70-73111e35afff|asia/india/chennai                  |81ee8ecf-1087-401c-ba32-e939b3c23050-0_2-27-29_20220224015538.parquet|0.651058505660742   |0.8192868687714224  |driver-213|0.20714896002914462|0.06224031095826987|41.06290929046368 |rider-213|1645485296066|02ca3ce3-4925-480a-8b70-73111e35afff|asia/india/chennai                  |
+-------------------+--------------------+------------------------------------+------------------------------------+---------------------------------------------------------------------+--------------------+--------------------+----------+-------------------+-------------------+------------------+---------+-------------+------------------------------------+------------------------------------+

同时,因为这是用的cow表类型,所以可以在hdfs文件系统中看到2个parquet文件,如下图所示:


注:Hudi系列博文为通过对Hudi官网学习记录所写,其中有加入个人理解,如有不足,请各位读者谅解☺☺☺

注:其他相关文章链接由此进(包括Hudi在内的各数据湖相关博文) -> 数据湖 文章汇总


最后

以上就是忧伤鸡为你收集整理的数据湖之Hudi(11):使用Spark更新Hudi中的数据0. 相关文章链接1. 环境准备2. Maven依赖3. 核心代码的全部内容,希望文章能够帮你解决数据湖之Hudi(11):使用Spark更新Hudi中的数据0. 相关文章链接1. 环境准备2. Maven依赖3. 核心代码所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(48)

评论列表共有 0 条评论

立即
投稿
返回
顶部