Sqoop简介（1.4.7 最新版本）Sqoop简介

74 阅读 0 评论 49 点赞

我是靠谱客的博主谦让吐司，最近开发中收集的这篇文章主要介绍Sqoop简介（1.4.7 最新版本）Sqoop简介，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

Sqoop简介

1. 背景

1.1 概览

在这里插入图片描述

在大数据处理业务框架中，需要处理的数据一般是2个来源（行为日志和业务数据），有些公司如果有python部门，就需要加上python数据这一个。
python数据因为各个公司以及各个页面和接口中数据差异较大，一般需要针对公司和特定业务编写程序进行解析，然后存入HDFS或者其他分布式文件系统中。一般都是编写Spark程序（可以分布式进行处理，效率对比java程序要更高。当然，也可以使用java编写mapreduce程序）。
行为日志数据，此前已经说过，一般都是使用flume等分布式日志采集框架进行采集，因为这些框架比较成熟，可以直接进行一些预处理，然后存入HDFS或者其他地方。
业务数据一般都是存放在关系型数据库如mysql或者oracle，或者windows server中。没错，后2个都是需要花钱的，特别是oracle，一般公司用不起。这时候将数据从关系型数据库导出到大数据文件存放系统中，或者反过来，将数据从大数据文件存放系统中导出到关系型数据库中，目前主要还是使用sqoop，当然也可以是用datax以及其他相同作用框架。
不管是流量域数据（行为日志数据），还是业务域数据（存储在关系型数据库中业务数据），都是按照经典数仓分层进行处理，存储，以便于计算和分析，最后做展示。
经典数仓分层，一般是DIM维度层，再之上就是ODS贴源层，然后是DW层(一般划分为DWD数据明细层，DWS数据服务层)，然后是ADS应用层。而不管是流量域还是业务域的数据，都是需要按照这个分层进行数据预处理，提取，处理，存储到这些层级的。

1.2 官网

https://sqoop.apache.org/
注意，这里区分正常sqoop和sqoop2，但是后sqoop2还不稳定，不稳定，所以不要用于生产，而且还不和之前版本兼容。

2. 安装

sqoop本质是运行一个mapreduce程序，所以要运行sqoop，先启动hadoop（hdfs，yarn）。因为一般数据都是存放到hdfs中，mapreduce程序一般都是运行在yarn集群中。
下载官方压缩包1.4.7版本，然后解压缩。

# x是解压缩
tar -zxvf ...

# z是压缩
tar -zcvf

注意，linux操作系统中，一般第三方软件都是安装在opt或者usr目录下，这里是选择在opt目录下新建一个apps目录，用来专门安装第三方程序的。
命令和linux的jar命令类似

打开sqoop安装目录下的conf目录，编辑sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/apps/hadoop-3.1.1

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/apps/hadoop-3.1.1/

#set the path to where bin/hbase is available
#export HBASE_HOME=

#Set the path to where bin/hive is available
export HIVE_HOME=/opt/apps/hive-3.1.2/
export HIVE_CONF_DIR=/opt/apps/hive-3.1.2/conf
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*

#Set the path for where zookeper config dir is
#export ZOOCFGDIR=

这里需要按照自己目的，进行参数设置

HADOOP_COMMON_HOME和HADOOP_MAPRED_HOME都是设置的hadoop的home，也就是安装路径
如果需要使用到hbase，需要设置一下habse的安装路径
使用hive，需要设置参数较多，除了hive的安装路径，还有hive的conf目录路径，以及hadoop拼接hive的class path路径，这样sqoop可以去hadoop以及hive的lib目录下查找第三方依赖jar包。
注意，sqoop本身也会在自己的lib目录下查找第三方依赖jar包。这种根据配置的安装目录以及lib目录路径查找依赖第三方jar包是linux下框架常见的查找机制
如果使用到了zookeeper，还可以设置zookeeper相关配置文件。

防止一个mysql的jdbc驱动包到sqoop的lib目录下。根据上述jar包查找机制，这个mysql驱动jar包放在hive或者hadoop的lib目录下，也是可以的。

如果遇到错误，提示找不到hive.HiveConf…

直接将hive安装目录中的lib中的hive-common-2.3.5.jar 拷贝到sqoop的lib中，然后测试，如果通过，则表明修复好了。
上述步骤如果失败，则继续处理
<1. 按正常流程先安装sqoop
解压，修改sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/apps/hadoop2
export HADOOP_MAPRED_HOME=/opt/apps/hadoop2
export HIVE_HOME=/opt/apps/hive2
<2. 要在/root/.bash_profile中添加一句话：
export HIVE_HOME=/opt/apps/hive2
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*
< 3. 要在jdk的权限安全配置中添加如下配置：
vi /opt/apps/jdk/jre/lib/security/java.policy
在最后添加：
grant{
permission javax.management.MBeanTrustPermission “register”;
};
<4. 替换版本冲突的jar包
把sqoop的lib中的所有jackson-.jar重命名 jackson-.jar.bak
然后将hive中的lib中的所有jackson-*.jar 拷贝到sqoop的lib中
<5. 把hive的hive-site.xml拷贝到sqoop的conf目录中

查看数据库信息

bin/sqoop list-databases 
--connect jdbc:mysql://doit01:3306 
--username root 
--password ABC123abc.123

查看数据库中表信息

bin/sqoop list-tables 
--connect jdbc:mysql://doit01:3306/realtimedw 
--username root 
--password ABC123abc.123

3. 使用

使用说明

# Table 3. Import control arguments:
# Argument	Description
# --append	Append data to an existing dataset in HDFS
# --as-avrodatafile	Imports data to Avro Data Files
# --as-sequencefile	Imports data to SequenceFiles
# --as-textfile	Imports data as plain text (default)
# --as-parquetfile	Imports data to Parquet Files
# --boundary-query <statement>	Boundary query to use for creating splits
# --columns <col,col,col…>	Columns to import from table
# --delete-target-dir	Delete the import target directory if it exists
# --direct	Use direct connector if exists for the database
# --fetch-size <n>	Number of entries to read from database at once.
# --inline-lob-limit <n>	Set the maximum size for an inline LOB
# -m,--num-mappers <n>	Use n map tasks to import in parallel
# -e,--query <statement>	Import the results of statement.
# --split-by <column-name>	Column of the table used to split work units. Cannot be used with --autoreset-to-one-mapper option.
# --split-limit <n>	Upper Limit for each split size. This only applies to Integer and Date columns. For date or timestamp fields it is calculated in seconds.
# --autoreset-to-one-mapper	Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with --split-by <col> option.
# --table <table-name>	Table to read
# --target-dir <dir>	HDFS destination dir
# --temporary-rootdir <dir>	HDFS directory for temporary files created during import (overrides default "_sqoop")
# --warehouse-dir <dir>	HDFS parent for table destination
# --where <where clause>	WHERE clause to use during import
# -z,--compress	Enable compression
# --compression-codec <c>	Use Hadoop codec (default gzip)
# --null-string <null-string>	The string to be written for a null value for string columns
# --null-non-string <null-string>	The string to be written for a null value for non-string columns

3.1 全量导入

3.1.1 从mysql中导出数据到hdfs

说明

并行度的问题补充：
一个maptask从mysql中获取数据的速度约为4-5m/s，而mysql服务器的吞吐量40-50M/s
那么，在mysql中的数据量很大的场景下，可以考虑增加maptask的并行度来提高数据迁移速度
-m就是用来指定maptask的并行度
maptask一旦有多个，这时候指定一个split key一般是表的id，然后根据id进行划分。划分前一般会执行sql查询一下id最大值，最小值，方便做数据分割。

准备

确保sqoop把目标路径视作hdfs中的路径，需如下参数配置正确
core-site.xml

<property>
<name>fs.defaultFS</name>
<value>hdfs://doit01:8020/</value>
</property>

需要将mr程序运行在yarn上，则需要确保mapred-site.xml上有如下配置

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

导出sqoop命令演示

bin/sqoop import 
--connect jdbc:mysql://doit01:3306/realtimedw 
--username root 
--password ABC123abc.123 
--table dim_pginfo 
--target-dir /sqoopdata/dim_pginfo2  
--fields-terminated-by '01' 
--delete-target-dir 
--compress       
--compression-codec gzip   
--split-by id 
-m 2

参数说明
import 说明是一个数据导入到hdfs的过程
斜杠是多行shell命令换行符号
–connect jdbc:mysql://doit01:3306/realtimedw 这是mysql连接协议及其地址，注意包含了数据库名字
–username root
–password ABC123abc.123
这个分别是mysql的账号和密码
–table dim_pginfo 这是数据库的表名
–target-dir /sqoopdata/dim_pginfo2 这是hdfs上对应存放数据目录路径‘
–fields-terminated-by ‘01’ 这是文件以什么分割符号切割，这里一般都是以不可见不可打印字符分割，因为可以打印可以显式的符号很容易出现在mysql字段的字符串中，引发后续结构化文件读取处理时那一有效正确切割的问题。
–delete-target-dir 这个是大家写mr程序时经常会遇到的输出文件已存在，就会报异常。这个参数是，如果目标目录已存在，则删除掉旧的目录。注意，一定一定检查，是否有必要删除旧的目录
–compress
–compression-codec gzip
这是指定是否压缩，以及压缩文件格式。可以使用多种压缩文件格式
–split-by id 这是指定，mysql文件中，使用哪个字段进行文件划分，因为mapreduce是会并行执行，这时候需要指定每个并行任务根据哪个字段对文件做切割划分
-m 2 这是maptask的数量设置，这里设置是2.

可以指定生产的文件类型
–as-avrodatafile
–as-parquetfile
–as-sequencefile
–as-textfile

空值处理
import处理，也就是数据从mysql中导出。mysql中的null值，写入hdfs文件时，用什么符号来代替(默认是用的"null")
–null-non-string ‘N’
–null-string ‘N’
export处理，hdfs中的文件，什么样的符号应该以null值形式写入mysql
–input-null-non-string ‘N’
–input-null-string ‘N’

非id键对数据做切割，可以使用如下参数
-Dorg.apache.sqoop.splitter.allow_text_splitter=true
–split-by name

sqoop import -Dorg.apache.sqoop.splitter.allow_text_splitter=true 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password root 
--table noid 
--target-dir /sqooptest3  
--fields-terminated-by ',' 
--split-by name 
-m 2

3.1.2 从mysql导入到hive中

注意，对于sqoop来说，import还是export，sqoop是站在大数据处理框架一侧，所以从mysql导出数据，就是import

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password 211123 
--table doit_jw_stu_base 
--hive-import 
--hive-table doit_jw_stu.base 
--delete-target-dir 
--as-textfile 
--fields-terminated-by ',' 
--compress   
--compression-codec gzip 
--null-string '\N' 
--null-non-string '\N' 
--hive-overwrite 
--split-by stu_id 
-m 2

先将数据从mysql导入hdfs，然后利用hive的元数据操作的jar包，去hive的元数据库中生成相应的元数据，并将数据文件导入hive表目录。
所以实际上就是2步，先将数据导出为文件，然后使用hive的功能，将这个文件load data方式导入到hive表中。

参数说明

–connect jdbc:mysql://h3:3306/ry
–username root
–password 211123 还是一样，mysql的链接地址，数据库名字，账号和密码
–table doit_jw_stu_base 这是mysql中表的名字
–hive-import 表名导入到hive表中
–hive-table doit_jw_stu.base 这是表明导入到hive的doit_jw_stu数据库的base表中
–delete-target-dir 还是一样的，如果hive存放数据的hdfs文件夹存在，就删除对应文件。（hive的元数据存在数据库中，数据存在hdfs中）
–as-textfile 这是存放的文件类型
–fields-terminated-by ‘,’ 字段切割方式
–compress
–compression-codec gzip 这是压缩以及压缩方式
–null-string ‘N’
–null-non-string ‘N’ 这是指定，如果遇到了mysql中的null，在hdfs文件中以什么符号标识，否则hive解析hdfs中文件时，会无法识别是否是mysql中的null
–hive-overwrite 这是覆盖的方式
–split-by stu_id 以什么字段对mysql文件做切分，方便mapreduce程序并行读取处理
-m 2 maptask的数量

–hive-database xdb 也可以单独指定hive的数据库名字

3.1.3 条件导入

where条件导入

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password 211123 
--table doit_jw_stu_base 
--hive-import 
--hive-table yiee_dw.doit_jw_stu_base2 
--delete-target-dir 
--as-textfile 
--fields-terminated-by ',' 
--compress   
--compression-codec gzip 
--split-by stu_id 
--null-string '\N' 
--null-non-string '\N' 
--hive-overwrite 
--where "stu_age>25"  
-m 2

–where “stu_age>25” 这就是编写一个sql的where限定条件一样

–columns条件导入

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password haitao.211123 
--table doit_jw_stu_base 
--hive-import 
--hive-table yiee_dw.doit_jw_stu_base3 
--delete-target-dir 
--as-textfile 
--fields-terminated-by ',' 
--compress   
--compression-codec gzip 
--split-by stu_id 
--null-string '\N' 
--null-non-string '\N' 
--hive-overwrite 
--where "stu_age>25"  
--columns "stu_id,stu_name,stu_phone"   
-m 2

这就是限定数据表中哪几个字段需要导入一样

–query查询语句导入

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password haitao.211123 
--hive-import 
--hive-table yiee_dw.doit_jw_stu_base4  
--as-textfile 
--fields-terminated-by ',' 
--compress   
--compression-codec gzip 
--split-by stu_id 
--null-string '\N' 
--null-non-string '\N' 
--hive-overwrite  
--query 'select stu_id,stu_name,stu_age,stu_term from doit_jw_stu_base where stu_createtime>"2019-09-24 23:59:59" and stu_sex="1" and $CONDITIONS'  
--target-dir '/user/root/tmp'   
-m 2

有了–query，就不要有–table了，也不要有–where了，也不要有–columns了
query自由查询导入时，sql语句中必须带 $CONDITIONS条件： where $CONDITIONS ，要么 where id>20 and $C O N D I T I O N S 因为 s q o o p 要将你的 s q l 语句交给多个不同的 m a p t a s k 执行，每个 m a p t a s k 执行 s q l 时肯定要按任务规划加范围条件，所以就提供了这个$ CONDITIONS作为将来拼接条件的占位符

query复杂查询
注意，不要在导入数据时做复杂查询，要么在mysql中建立中间临时表，再把查询后数据导出；要么把相关表数据导出后，使用hive进行数据复杂查询处理

官方并不建议这么做
query可以支持复杂查询（包含join、子查询、分组查询）但是，一定要去深入思考你的sql的预期运算逻辑和maptask并行分任务的事实！
–query “select id,member_id,order_sn,receiver_province from doit_mall.oms_order where id>20 and $CONDITIONS”
–query ‘select id,member_id,order_sn,receiver_province from doit_mall.oms_order where id>20 and $CONDITIONS’

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password haitao.211123 
--hive-import 
--hive-table yiee_dw.doit_jw_stu_base6 
--as-textfile 
--fields-terminated-by ',' 
--compress   
--compression-codec gzip 
--split-by id 
--null-string '\N' 
--null-non-string '\N' 
--hive-overwrite  
--query 'select b.id,a.stu_id,a.stu_name,a.stu_phone,a.stu_sex,b.stu_zsroom from doit_jw_stu_base a join doit_jw_stu_zsgl b on a.stu_id=b.stu_id where $CONDITIONS' 
--target-dir '/user/root/tmp'   
-m 2

3.2 增量导入

实际企业开发中，有些表中数据，可能很长时间都不会变化，这时候直接全量导入，一次导入，可能几个月甚至半年都不用再次导入。
有些表中数据，每天都会有新增和变化的旧数据，这时候就需要采取增量导入的方式处理。

3.2.1 根据一个递增字段界定是否增量数据

适合业务表中数据一旦生成就不会更改的数据，否则不适合这种导入方式，例如订单表就不适合这种方式。

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password 211123 
--table doit_jw_stu_zsgl 
--hive-import 
--hive-table yiee_dw.doit_jw_stu_zsgl 
--split-by id 
--incremental append 
--check-column id 
--last-value 40 
-m 2

–incremental append 这表明是增量导入
–check-column id 这个表明以哪个字段进行判定
–last-value 40 这表明是从字段的哪个值开始导新增的数据

3.2.2 根据修改时间来界定增量数据

业务表中必须有一个时间字段，且该字段会跟随数据的修改而修改）
lastmodified 模式下的增量导入，不支持hive导入.
注意这种方式，因为涉及旧数据的状态改变，所以不支持hive直接导入，但是可以导出为文件之后，使用spark程序根据时间戳进行排序来处理，或者使用spark datafrmae开窗口根据时间戳进行合并处理。

sqoop import 
--connect jdbc:mysql://h3:3306/ry 
--username root 
--password haitao.211123 
--table doit_jw_stu_zsgl 
--target-dir '/sqoopdata/doit_jw_stu_zsgl'  
--incremental lastmodified 
--check-column update_time 
--last-value '2020-03-18 23:59:59'  
--fields-terminated-by ',' 
-m 1

–incremental lastmodified 这是表明增量处理，以lastmodified 来划分哪个是增量数据。规范的公司一般会有create time和modified time2个字段.一般就是append或者lastmodified 2种类型
–check-column update_time 这是需要检查的mysql的字段名
–last-value ‘2020-03-18 23:59:59’ 这是需要进行检查的字段值的区间，也就是最大值是2020-03-18 23:59:59，超过的就不需要了
–fields-terminated-by ‘,’ 这是分割符号

导入后的数据如果需要跟存量进行合并，则可以附加此参数
–merge-key id
导入的增量数据不是简单地追加到目标存储，而是将新旧数据进行合并。
如下案例：

bin/sqoop codegen 
--connect jdbc:mysql://impala01:3306/sqooptest 
--username root 
--password ABC123abc.123 
--table stu 
--bindir /opt/apps/code/stu 
--class-name Stu 
--fields-terminated-by ","

bin/sqoop merge 
--new-data /sqoopdata/stu1 
--onto /sqoopdata/stu0 
--target-dir /sqoopdata/stu_all 
--jar-file /opt/apps/code/stu/Stu.jar 
--class-name Stu 
--merge-key id

注意，一般不使用sqoop的文件合并，而是编写程序进行合并，这样自由度更高，更贴合业务要求。

3.3 导出数据

3.3.1 导出数据到mysql中

sqoop  export 
--connect jdbc:mysql://h3:3306/dicts 
--username root 
--password haitao.211123 
--table dau_t 
--input-fields-terminated-by ',' 
--export-dir '/user/hive/warehouse/dau_t' 
--batch

export 表明是数据导出，这里是从hdfs中导出数据到mysql中
–connect jdbc:mysql://h3:3306/dicts 这是mysql链接地址，注意还有数据库名字
–username root
–password haitao.211123 账号密码
–table dau_t 表的名字
–input-fields-terminated-by ‘,’ 切割符号
–export-dir ‘/user/hive/warehouse/dau_t’ 指定从hdfs哪个目录下导出
–batch # 以batch模式去执行sql

sqoop  export 
--connect jdbc:mysql://h3:3306/doit_mall 
--username root 
--password root 
--table person 
--export-dir '/export3/' 
--input-null-string '\N' 
--input-null-non-string '\N' 
--update-mode allowinsert  
--update-key id 
--batch

控制新旧数据导到mysql时，选择更新模式
–input-null-string ‘N’
–input-null-non-string ‘N’ 指定hdfs中的符号被mysql解析为null
–update-mode allowinsert 这是更新模式，允许追加以及更新数据
–update-key id 更新字段依据

–update-mode 如果选择updateonly，只会对mysql中已存在的id数据进行更新，不存在的id数据不会插入了
如果选择allowinsert，既会更新已存在id数据，也会插入新的id数据

4. 原理

sqoop的本质，其实就是启动一个mapreduce程序，然后执行maptask，注意没有reduce task（只是使用导入导出功能，不涉及合并功能，是没有reduce task的。因为导入导出就是一个数据读取和写出的过程，maptask即可）
如下案例

bin/sqoop import 
> --connect jdbc:mysql://doit01:3306/realtimedw 
> --username root 
> --password ABC123abc.123 
> --table dim_pginfo 
> --target-dir /sqoopdata/dim_pginfo2  
> --fields-terminated-by '01' 
> --delete-target-dir 
> --compress       
> --compression-codec gzip   
> --split-by id 
> -m 2

运行结果

Warning: /opt/apps/sqoop-1.4.7/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/apps/sqoop-1.4.7/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /opt/apps/sqoop-1.4.7/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apps/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/hbase-2.0.6/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-10-17 19:46:38,184 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
2020-10-17 19:46:38,214 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2020-10-17 19:46:38,315 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2020-10-17 19:46:38,315 INFO tool.CodeGenTool: Beginning code generation
Sat Oct 17 19:46:38 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-10-17 19:46:38,812 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dim_pginfo` AS t LIMIT 1
2020-10-17 19:46:38,834 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dim_pginfo` AS t LIMIT 1
2020-10-17 19:46:38,839 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/apps/hadoop-3.1.1
Note: /tmp/sqoop-root/compile/5cc14149e1a2cf679ffb35e7c5b8331a/dim_pginfo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2020-10-17 19:46:40,037 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/5cc14149e1a2cf679ffb35e7c5b8331a/dim_pginfo.jar
2020-10-17 19:46:40,657 INFO tool.ImportTool: Destination directory /sqoopdata/dim_pginfo2 is not present, hence not deleting.
2020-10-17 19:46:40,657 WARN manager.MySQLManager: It looks like you are importing from mysql.
2020-10-17 19:46:40,657 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2020-10-17 19:46:40,657 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2020-10-17 19:46:40,657 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2020-10-17 19:46:40,659 INFO mapreduce.ImportJobBase: Beginning import of dim_pginfo
2020-10-17 19:46:40,660 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2020-10-17 19:46:40,663 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2020-10-17 19:46:40,714 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2020-10-17 19:46:41,113 INFO client.RMProxy: Connecting to ResourceManager at doitedu01/192.168.77.41:8032
2020-10-17 19:46:41,335 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1602935137978_0001
Sat Oct 17 19:46:43 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2020-10-17 19:46:43,650 INFO db.DBInputFormat: Using read commited transaction isolation
2020-10-17 19:46:43,650 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`id`), MAX(`id`) FROM `dim_pginfo`
2020-10-17 19:46:43,658 INFO db.IntegerSplitter: Split size: 999; Num splits: 2 from: 1 to: 2000
2020-10-17 19:46:43,725 INFO mapreduce.JobSubmitter: number of splits:2
2020-10-17 19:46:43,758 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-10-17 19:46:43,845 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1602935137978_0001
2020-10-17 19:46:43,846 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-10-17 19:46:44,015 INFO conf.Configuration: resource-types.xml not found
2020-10-17 19:46:44,016 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-10-17 19:46:44,364 INFO impl.YarnClientImpl: Submitted application application_1602935137978_0001
2020-10-17 19:46:44,749 INFO mapreduce.Job: The url to track the job: http://doitedu01:8088/proxy/application_1602935137978_0001/
2020-10-17 19:46:44,749 INFO mapreduce.Job: Running job: job_1602935137978_0001
2020-10-17 19:46:52,875 INFO mapreduce.Job: Job job_1602935137978_0001 running in uber mode : false
2020-10-17 19:46:52,876 INFO mapreduce.Job:  map 0% reduce 0%
2020-10-17 19:46:58,971 INFO mapreduce.Job:  map 50% reduce 0%
2020-10-17 19:47:00,986 INFO mapreduce.Job:  map 100% reduce 0%
2020-10-17 19:47:01,995 INFO mapreduce.Job: Job job_1602935137978_0001 completed successfully
2020-10-17 19:47:02,057 INFO mapreduce.Job: Counters: 32
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=447078
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=206
                HDFS: Number of bytes written=8602
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=4
        Job Counters 
                Launched map tasks=2
                Other local map tasks=2
                Total time spent by all maps in occupied slots (ms)=10121
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=10121
                Total vcore-milliseconds taken by all map tasks=10121
                Total megabyte-milliseconds taken by all map tasks=10363904
        Map-Reduce Framework
                Map input records=2000
                Map output records=2000
                Input split bytes=206
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=450
                CPU time spent (ms)=3330
                Physical memory (bytes) snapshot=555163648
                Virtual memory (bytes) snapshot=5572423680
                Total committed heap usage (bytes)=359137280
                Peak Map Physical memory (bytes)=290725888
                Peak Map Virtual memory (bytes)=2787061760
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=8602
2020-10-17 19:47:02,061 INFO mapreduce.ImportJobBase: Transferred 8.4004 KB in 21.3405 seconds (403.0834 bytes/sec)
2020-10-17 19:47:02,063 INFO mapreduce.ImportJobBase: Retrieved 2000 records.

5. 指令简介

import

usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>                                       Specify JDBC
                                                              connect
                                                              string
   --connection-manager <class-name>                          Specify
                                                              connection
                                                              manager
                                                              class name
   --connection-param-file <properties-file>                  Specify
                                                              connection
                                                              parameters
                                                              file
   --driver <class-name>                                      Manually
                                                              specify JDBC
                                                              driver class
                                                              to use
   --hadoop-home <hdir>                                       Override
                                                              $HADOOP_MAPR
                                                              ED_HOME_ARG
   --hadoop-mapred-home <dir>                                 Override
                                                              $HADOOP_MAPR
                                                              ED_HOME_ARG
   --help                                                     Print usage
                                                              instructions
   --metadata-transaction-isolation-level <isolationlevel>    Defines the
                                                              transaction
                                                              isolation
                                                              level for
                                                              metadata
                                                              queries. For
                                                              more details
                                                              check
                                                              java.sql.Con
                                                              nection
                                                              javadoc or
                                                              the JDBC
                                                              specificaito
                                                              n
   --oracle-escaping-disabled <boolean>                       Disable the
                                                              escaping
                                                              mechanism of
                                                              the
                                                              Oracle/OraOo
                                                              p connection
                                                              managers
-P                                                            Read
                                                              password
                                                              from console
   --password <password>                                      Set
                                                              authenticati
                                                              on password
   --password-alias <password-alias>                          Credential
                                                              provider
                                                              password
                                                              alias
   --password-file <password-file>                            Set
                                                              authenticati
                                                              on password
                                                              file path
   --relaxed-isolation                                        Use
                                                              read-uncommi
                                                              tted
                                                              isolation
                                                              for imports
   --skip-dist-cache                                          Skip copying
                                                              jars to
                                                              distributed
                                                              cache
   --temporary-rootdir <rootdir>                              Defines the
                                                              temporary
                                                              root
                                                              directory
                                                              for the
                                                              import
   --throw-on-error                                           Rethrow a
                                                              RuntimeExcep
                                                              tion on
                                                              error
                                                              occurred
                                                              during the
                                                              job
   --username <username>                                      Set
                                                              authenticati
                                                              on username
   --verbose                                                  Print more
                                                              information
                                                              while
                                                              working

Import control arguments:
   --append                                                   Imports data
                                                              in append
                                                              mode
   --as-avrodatafile                                          Imports data
                                                              to Avro data
                                                              files
   --as-parquetfile                                           Imports data
                                                              to Parquet
                                                              files
   --as-sequencefile                                          Imports data
                                                              to
                                                              SequenceFile
                                                              s
   --as-textfile                                              Imports data
                                                              as plain
                                                              text
                                                              (default)
   --autoreset-to-one-mapper                                  Reset the
                                                              number of
                                                              mappers to
                                                              one mapper
                                                              if no split
                                                              key
                                                              available
   --boundary-query <statement>                               Set boundary
                                                              query for
                                                              retrieving
                                                              max and min
                                                              value of the
                                                              primary key
   --columns <col,col,col...>                                 Columns to
                                                              import from
                                                              table
   --compression-codec <codec>                                Compression
                                                              codec to use
                                                              for import
   --delete-target-dir                                        Imports data
                                                              in delete
                                                              mode
   --direct                                                   Use direct
                                                              import fast
                                                              path
   --direct-split-size <n>                                    Split the
                                                              input stream
                                                              every 'n'
                                                              bytes when
                                                              importing in
                                                              direct mode
-e,--query <statement>                                        Import
                                                              results of
                                                              SQL
                                                              'statement'
   --fetch-size <n>                                           Set number
                                                              'n' of rows
                                                              to fetch
                                                              from the
                                                              database
                                                              when more
                                                              rows are
                                                              needed
   --inline-lob-limit <n>                                     Set the
                                                              maximum size
                                                              for an
                                                              inline LOB
-m,--num-mappers <n>                                          Use 'n' map
                                                              tasks to
                                                              import in
                                                              parallel
   --mapreduce-job-name <name>                                Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --merge-key <column>                                       Key column
                                                              to use to
                                                              join results
   --split-by <column-name>                                   Column of
                                                              the table
                                                              used to
                                                              split work
                                                              units
   --split-limit <size>                                       Upper Limit
                                                              of rows per
                                                              split for
                                                              split
                                                              columns of
                                                              Date/Time/Ti
                                                              mestamp and
                                                              integer
                                                              types. For
                                                              date or
                                                              timestamp
                                                              fields it is
                                                              calculated
                                                              in seconds.
                                                              split-limit
                                                              should be
                                                              greater than
                                                              0
   --table <table-name>                                       Table to
                                                              read
   --target-dir <dir>                                         HDFS plain
                                                              table
                                                              destination
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler <validation-failurehandler>    Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold <validation-threshold>              Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator <validator>                                    Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator
   --warehouse-dir <dir>                                      HDFS parent
                                                              for table
                                                              destination
   --where <where clause>                                     WHERE clause
                                                              to use
                                                              during
                                                              import
-z,--compress                                                 Enable
                                                              compression

Incremental import arguments:
   --check-column <column>        Source column to check for incremental
                                  change
   --incremental <import-type>    Define an incremental import of type
                                  'append' or 'lastmodified'
   --last-value <value>           Last imported value in the incremental
                                  check column

Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing
                                      character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: n  escaped-by: 
                                      optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character

Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character

Hive arguments:
   --create-hive-table                         Fail if the target hive
                                               table exists
   --external-table-dir <hdfs path>            Sets where the external
                                               table is in HDFS
   --hive-database <database-name>             Sets the database name to
                                               use when importing to hive
   --hive-delims-replacement <arg>             Replace Hive record x01
                                               and row delimiters (nr)
                                               from imported string fields
                                               with user-defined string
   --hive-drop-import-delims                   Drop Hive record x01 and
                                               row delimiters (nr) from
                                               imported string fields
   --hive-home <dir>                           Override $HIVE_HOME
   --hive-import                               Import tables into Hive
                                               (Uses Hive's default
                                               delimiters if none are
                                               set.)
   --hive-overwrite                            Overwrite existing data in
                                               the Hive table
   --hive-partition-key <partition-key>        Sets the partition key to
                                               use when importing to hive
   --hive-partition-value <partition-value>    Sets the partition value to
                                               use when importing to hive
   --hive-table <table-name>                   Sets the table name to use
                                               when importing to hive
   --map-column-hive <arg>                     Override mapping for
                                               specific column to hive
                                               types.

HBase arguments:
   --column-family <family>    Sets the target column family for the
                               import
   --hbase-bulkload            Enables HBase bulk loading
   --hbase-create-table        If specified, create missing HBase tables
   --hbase-row-key <col>       Specifies which input column to use as the
                               row key
   --hbase-table <table>       Import to <table> in HBase

HCatalog arguments:
   --hcatalog-database <arg>                        HCatalog database name
   --hcatalog-home <hdir>                           Override $HCAT_HOME
   --hcatalog-partition-keys <partition-key>        Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values <partition-value>    Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table <arg>                           HCatalog table name
   --hive-home <dir>                                Override $HIVE_HOME
   --hive-partition-key <partition-key>             Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value <partition-value>         Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive <arg>                          Override mapping for
                                                    specific column to
                                                    hive types.

HCatalog import specific options:
   --create-hcatalog-table             Create HCatalog before import
   --drop-and-create-hcatalog-table    Drop and Create HCatalog before
                                       import
   --hcatalog-storage-stanza <arg>     HCatalog storage stanza for table
                                       creation

Accumulo arguments:
   --accumulo-batch-size <size>          Batch size in bytes
   --accumulo-column-family <family>     Sets the target column family for
                                         the import
   --accumulo-create-table               If specified, create missing
                                         Accumulo tables
   --accumulo-instance <instance>        Accumulo instance name.
   --accumulo-max-latency <latency>      Max write latency in milliseconds
   --accumulo-password <password>        Accumulo password.
   --accumulo-row-key <col>              Specifies which input column to
                                         use as the row key
   --accumulo-table <table>              Import to <table> in Accumulo
   --accumulo-user <user>                Accumulo user name.
   --accumulo-visibility <vis>           Visibility token to be applied to
                                         all rows imported
   --accumulo-zookeepers <zookeepers>    Comma-separated list of
                                         zookeepers (host:port)

Code generation arguments:
   --bindir <dir>                             Output directory for
                                              compiled objects
   --class-name <name>                        Sets the generated class
                                              name. This overrides
                                              --package-name. When
                                              combined with --jar-file,
                                              sets the input class.
   --escape-mapping-column-names <boolean>    Disable special characters
                                              escaping in column names
   --input-null-non-string <null-str>         Input null non-string
                                              representation
   --input-null-string <null-str>             Input null string
                                              representation
   --jar-file <file>                          Disable code generation; use
                                              specified jar
   --map-column-java <arg>                    Override mapping for
                                              specific columns to java
                                              types
   --null-non-string <null-str>               Null non-string
                                              representation
   --null-string <null-str>                   Null string representation
   --outdir <dir>                             Output directory for
                                              generated code
   --package-name <name>                      Put auto-generated classes
                                              in this package

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]


At minimum, you must specify --connect and --table
Arguments to mysqldump and other subprograms may be supplied
after a '--' on the command line.

export

usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>                                       Specify JDBC
                                                              connect
                                                              string
   --connection-manager <class-name>                          Specify
                                                              connection
                                                              manager
                                                              class name
   --connection-param-file <properties-file>                  Specify
                                                              connection
                                                              parameters
                                                              file
   --driver <class-name>                                      Manually
                                                              specify JDBC
                                                              driver class
                                                              to use
   --hadoop-home <hdir>                                       Override
                                                              $HADOOP_MAPR
                                                              ED_HOME_ARG
   --hadoop-mapred-home <dir>                                 Override
                                                              $HADOOP_MAPR
                                                              ED_HOME_ARG
   --help                                                     Print usage
                                                              instructions
   --metadata-transaction-isolation-level <isolationlevel>    Defines the
                                                              transaction
                                                              isolation
                                                              level for
                                                              metadata
                                                              queries. For
                                                              more details
                                                              check
                                                              java.sql.Con
                                                              nection
                                                              javadoc or
                                                              the JDBC
                                                              specificaito
                                                              n
   --oracle-escaping-disabled <boolean>                       Disable the
                                                              escaping
                                                              mechanism of
                                                              the
                                                              Oracle/OraOo
                                                              p connection
                                                              managers
-P                                                            Read
                                                              password
                                                              from console
   --password <password>                                      Set
                                                              authenticati
                                                              on password
   --password-alias <password-alias>                          Credential
                                                              provider
                                                              password
                                                              alias
   --password-file <password-file>                            Set
                                                              authenticati
                                                              on password
                                                              file path
   --relaxed-isolation                                        Use
                                                              read-uncommi
                                                              tted
                                                              isolation
                                                              for imports
   --skip-dist-cache                                          Skip copying
                                                              jars to
                                                              distributed
                                                              cache
   --temporary-rootdir <rootdir>                              Defines the
                                                              temporary
                                                              root
                                                              directory
                                                              for the
                                                              import
   --throw-on-error                                           Rethrow a
                                                              RuntimeExcep
                                                              tion on
                                                              error
                                                              occurred
                                                              during the
                                                              job
   --username <username>                                      Set
                                                              authenticati
                                                              on username
   --verbose                                                  Print more
                                                              information
                                                              while
                                                              working

Export control arguments:
   --batch                                                    Indicates
                                                              underlying
                                                              statements
                                                              to be
                                                              executed in
                                                              batch mode
   --call <arg>                                               Populate the
                                                              table using
                                                              this stored
                                                              procedure
                                                              (one call
                                                              per row)
   --clear-staging-table                                      Indicates
                                                              that any
                                                              data in
                                                              staging
                                                              table can be
                                                              deleted
   --columns <col,col,col...>                                 Columns to
                                                              export to
                                                              table
   --direct                                                   Use direct
                                                              export fast
                                                              path
   --export-dir <dir>                                         HDFS source
                                                              path for the
                                                              export
-m,--num-mappers <n>                                          Use 'n' map
                                                              tasks to
                                                              export in
                                                              parallel
   --mapreduce-job-name <name>                                Set name for
                                                              generated
                                                              mapreduce
                                                              job
   --staging-table <table-name>                               Intermediate
                                                              staging
                                                              table
   --table <table-name>                                       Table to
                                                              populate
   --update-key <key>                                         Update
                                                              records by
                                                              specified
                                                              key column
   --update-mode <mode>                                       Specifies
                                                              how updates
                                                              are
                                                              performed
                                                              when new
                                                              rows are
                                                              found with
                                                              non-matching
                                                              keys in
                                                              database
   --validate                                                 Validate the
                                                              copy using
                                                              the
                                                              configured
                                                              validator
   --validation-failurehandler <validation-failurehandler>    Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationFa
                                                              ilureHandler
   --validation-threshold <validation-threshold>              Fully
                                                              qualified
                                                              class name
                                                              for
                                                              ValidationTh
                                                              reshold
   --validator <validator>                                    Fully
                                                              qualified
                                                              class name
                                                              for the
                                                              Validator

Input parsing arguments:
   --input-enclosed-by <char>               Sets a required field encloser
   --input-escaped-by <char>                Sets the input escape
                                            character
   --input-fields-terminated-by <char>      Sets the input field separator
   --input-lines-terminated-by <char>       Sets the input end-of-line
                                            char
   --input-optionally-enclosed-by <char>    Sets a field enclosing
                                            character

Output line formatting arguments:
   --enclosed-by <char>               Sets a required field enclosing
                                      character
   --escaped-by <char>                Sets the escape character
   --fields-terminated-by <char>      Sets the field separator character
   --lines-terminated-by <char>       Sets the end-of-line character
   --mysql-delimiters                 Uses MySQL's default delimiter set:
                                      fields: ,  lines: n  escaped-by: 
                                      optionally-enclosed-by: '
   --optionally-enclosed-by <char>    Sets a field enclosing character

Code generation arguments:
   --bindir <dir>                             Output directory for
                                              compiled objects
   --class-name <name>                        Sets the generated class
                                              name. This overrides
                                              --package-name. When
                                              combined with --jar-file,
                                              sets the input class.
   --escape-mapping-column-names <boolean>    Disable special characters
                                              escaping in column names
   --input-null-non-string <null-str>         Input null non-string
                                              representation
   --input-null-string <null-str>             Input null string
                                              representation
   --jar-file <file>                          Disable code generation; use
                                              specified jar
   --map-column-java <arg>                    Override mapping for
                                              specific columns to java
                                              types
   --null-non-string <null-str>               Null non-string
                                              representation
   --null-string <null-str>                   Null string representation
   --outdir <dir>                             Output directory for
                                              generated code
   --package-name <name>                      Put auto-generated classes
                                              in this package

HCatalog arguments:
   --hcatalog-database <arg>                        HCatalog database name
   --hcatalog-home <hdir>                           Override $HCAT_HOME
   --hcatalog-partition-keys <partition-key>        Sets the partition
                                                    keys to use when
                                                    importing to hive
   --hcatalog-partition-values <partition-value>    Sets the partition
                                                    values to use when
                                                    importing to hive
   --hcatalog-table <arg>                           HCatalog table name
   --hive-home <dir>                                Override $HIVE_HOME
   --hive-partition-key <partition-key>             Sets the partition key
                                                    to use when importing
                                                    to hive
   --hive-partition-value <partition-value>         Sets the partition
                                                    value to use when
                                                    importing to hive
   --map-column-hive <arg>                          Override mapping for
                                                    specific column to
                                                    hive types.

Generic Hadoop command-line arguments:
(must preceed any tool-specific arguments)
Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines

The general command line syntax is:
command [genericOptions] [commandOptions]


At minimum, you must specify --connect, --export-dir, and --table

PS：
mysql 编码修改（中文错乱问题）,具体也可以查看我另外一篇专门修改mysql编码格式博客。

alter database db_name character set utf8;

ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

最后

以上就是谦让吐司为你收集整理的Sqoop简介（1.4.7 最新版本）Sqoop简介的全部内容，希望文章能够帮你解决Sqoop简介（1.4.7 最新版本）Sqoop简介所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错，欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：sqoop
浏览次数：74 次浏览
发布日期：2023-10-24 01:06:21
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_o_22_f0_12__7__18_z.html

Sqoop简介（1.4.7 最新版本）Sqoop简介

概述

Sqoop简介

1. 背景

1.1 概览

1.2 官网

2. 安装

3. 使用

3.1 全量导入

3.1.1 从mysql中导出数据到hdfs

3.1.2 从mysql导入到hive中

3.1.3 条件导入

3.2 增量导入

3.2.1 根据一个递增字段界定是否增量数据

3.2.2 根据修改时间来界定增量数据

3.3 导出数据

3.3.1 导出数据到mysql中

4. 原理

5. 指令简介

最后

评论列表共有 0 条评论

发表评论取消回复

Sqoop简介（1.4.7 最新版本）Sqoop简介

概述

Sqoop简介

1. 背景

1.1 概览

1.2 官网

2. 安装

3. 使用

3.1 全量导入

3.1.1 从mysql中导出数据到hdfs

3.1.2 从mysql导入到hive中

3.1.3 条件导入

3.2 增量导入

3.2.1 根据一个递增字段界定是否增量数据

3.2.2 根据修改时间来界定增量数据

3.3 导出数据

3.3.1 导出数据到mysql中

4. 原理

5. 指令简介

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复