【大数据开发】sqoop学习总结和踩过的坑

80 阅读 0 评论 53 点赞

我是靠谱客的博主内向世界，这篇文章主要介绍【大数据开发】sqoop学习总结和踩过的坑，现在分享给大家，希望可以做个参考。

结论（所有显示一个 - 的都是两个 – 放csdn就这个样子了，我也没办法）

1. where 子句要加单引号，query 要加双引号或者单引号，**其他都不要加单引号或者双引号**，具体区别看下面
--1. 如果使用--query 不能使用--table
--2. 如果使用--query, select语句中必须要有where子句，子句中必须要有$ CONDITIONS 
--3. select语句可以使用单引号，也可以使用双引号，如果使用双引号，那么$ 必须加转义字符
--4. 如果使用--query，需要指定切分字段，也就是参数--split-by
2. 参数--where要与--table配合使用，不能与--query一起用，语法不报错，但是不生效。
3. 在使用--table 和--columns时，--columns中可以没有主键约束的字段
4. --null-string  'value'  :  将字符串类型的null使用value来替代
--null-non-string 'value'  : 将非字符串类型的null使用value来代替
5. 导出时，mysql中的表的字段如果有主键，数据不能重复导入，否则报错。
6. 结论1：  mysql中的列少于hdfs上文件的列时，可以
结论2：  mysql中的列多于hdfs上文件的列时，不可以，但是使用--columns参数是可以的
7. 结论1：字段类型应该一致。
结论2：按照hdfs上的文件的列从左到右的属性给指定字段赋值
8. 结论1： 默认情况下，hdfs上的非字符串类型的null是不能转到mysql中。
结论2： 可以使用--input-null-non-string 'null' 将非字符串的null转到mysql中变成null。
结论3： hdfs上的null如果对应的是mysql中字符串类型，可以不用加--input-null-string 参数

9. --connect jdbc:mysql://qianfeng03:3306/sz2002 不能写成 --connect jdbc:mysql://qianfeng03:3306/sz2002/ 否则会报
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown error 1049
10. 使用sqlit-by的时候，后面添加的字段必须要在查询出来的表里，例如
sqoop import 
--connect 'jdbc:mysql://qianfeng03:3306/sz2002' 
--username root 
--password 123456 
--query 'select ename,job,deptno from emp where deptno<20 and $CONDITIONS' 
--target-dir /sz2002/sqoop/ 
--delete-target-dir 
--split-by "ename" 
是可以执行的，但是换成 --split-by "empno"就不行了 ，会报错java.io.IOException: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown error 1054

11. 如果切片时，--split-by 指定的字段有null值，则会忽略null值，含有null值的记录不会写到hdfs上
12. 目前只能将mysql的数据导入到hbase中，但是hbase中的数据不能导出到mysql中
注意：在导入到hbase中时:
--1. rowkey大小写问题
	如果是--table [--where]  那么hbase-row-key的字段名必须大写
	如果是--query并且指定了主键字段, hbase-row-key的字段名必须和query中的大小写一致，否则也要是大写的
--2. rowkey的指定问题
    --hbase-row-key，参数可以忽略，忽略情况下，主动将mysql中的主键作为rowkey
    				 也可以指定出来
    -- 如果mysql中的主键是复合键，那么此属性必须指定，并且使用逗号分隔
    -- 如果mysql中没有主键约束，那么应该指定--hbase-row-key
13. sqoop从mysql导入数据到Hive时，如果直接使用sqoop导入语句，则会自动创建一个内部表，因此导入的表会是内部表。要想导入外部表则需要先创建好Hive外部表，然后再导入数据，导入数据时不再需要指定target-dir，直接导入即可，这是因为创建外部表的语句已经有位置了

错误1：
ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown error 1146
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Unknown error 1146

解决方式：
数据库写错了，凡是针对数据库中的表进行增删查改都会报这个错误，所以要重点检查是否是操作出问题了

错误2：
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: Dumping data is not allowed by default, please run the job with -Dorg.apache.sqoop.export.text.dump_data_on_error=true to get corrupted line.
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: On input file: hdfs://qianfeng01:8020/sz2002/sqoop/emp/part-m-00002
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: At position 0
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper:
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: Currently processing split:
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: Paths:/sz2002/sqoop/emp/part-m-00002:0+94,/sz2002/sqoop/emp/part-m-00002:94+94
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper:
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: This issue might not necessarily be caused by current input
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: due to the batching nature of export.
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper:
20/09/12 23:06:23 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
20/09/12 23:06:23 INFO mapred.LocalJobRunner: Starting task: attempt_local51738395_0001_m_000002_0
20/09/12 23:06:23 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
20/09/12 23:06:23 INFO mapred.MapTask: Processing split: Paths:/sz2002/sqoop/emp/part-m-00000:0+91,/sz2002/sqoop/emp/part-m-00001:0+93
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper:
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: Exception raised during data export
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper:
20/09/12 23:06:23 ERROR mapreduce.TextExportMapper: Exception:
java.lang.RuntimeException: Can't parse input data: '20'

解决方式：
重点看抛出什么异常，查看'20'在哪个字段，再进行分析
结论1：  mysql中的列少于hdfs上文件的列时，可以
结论2：  mysql中的列多于hdfs上文件的列时，不可以，但是使用--columns参数是可以的


错误3：
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: Dumping data is not allowed by default, please run the job with -Dorg.apache.sqoop.export.text.dump_data_on_error=true to get corrupted line.
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: On input file: hdfs://qianfeng01:8020/sz2002/sqoop/emp/part-m-00003
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: At position 0
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper:
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: Currently processing split:
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: Paths:/sz2002/sqoop/emp/part-m-00003:0+134,/sz2002/sqoop/emp/part-m-00003:134+135
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper:
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: This issue might not necessarily be caused by current input
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: due to the batching nature of export.
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper:
20/09/12 23:12:46 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false
20/09/12 23:12:46 INFO mapred.LocalJobRunner: Starting task: attempt_local2102975756_0001_m_000001_0
20/09/12 23:12:46 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
20/09/12 23:12:46 INFO mapred.MapTask: Processing split: Paths:/sz2002/sqoop/emp/part-m-00002:0+94,/sz2002/sqoop/emp/part-m-00002:94+94
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper:
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: Exception raised during data export
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper:
20/09/12 23:12:46 ERROR mapreduce.TextExportMapper: Exception:
java.lang.RuntimeException: Can't parse input data: 'SALESMAN'

解决方式：
重点看最后一句抛出异常，这是由于指定的字段类型不匹配导致的，parse：分析的意思，因此在导入的时候应该将字段的顺序类型匹配好来
结论1：字段类型应该一致。
结论2：按照hdfs上的文件的列从左到右的属性给指定字段赋值