我是靠谱客的博主 怡然橘子,最近开发中收集的这篇文章主要介绍把sqoop加入Linux的环境中,SQOOP的安装配置,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

SQOOP是一款开源的工具,主要用于在Hadoop与传统的数据库间进行数据的传递,下面从SQOOP用户手册上摘录一段描述

Sqoop is a tool designed to transfer data between Hadoop andrelational databases. You can use Sqoop to import data from arelational database management system (RDBMS) such as MySQL or Oracleinto the Hadoop Distributed File System (HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.

这里我主要描述一下安装过程

1、下载相应软件

我使用的HADOOP版本是APACHE官方版本0.20.2,但是后来在使用的过程中报错,查阅了一些文章,发现SQOOP是不支持此版本的,一般都会推荐你使用CDH3。不过后来通过拷贝相应的包到sqoop-1.2.0-CDH3B4/lib下,依然还是可以使用的。当然,你可以选择直接使用CDH3。

下面是CDH3和SQOOP 1.2.0的下载地址

http://archive.cloudera.com/cdh/3/hadoop-0.20.2-CDH3B4.tar.gz

http://archive.cloudera.com/cdh/3/sqoop-1.2.0-CDH3B4.tar.gz

其中sqoop-1.2.0-CDH3B4依赖hadoop-core-0.20.2-CDH3B4.jar,所以你需要下载hadoop-0.20.2-CDH3B4.tar.gz,解压缩后将hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar复制到sqoop-1.2.0-CDH3B4/lib中。

另外,sqoop导入mysql数据运行过程中依赖mysql-connector-java-*.jar,所以你需要下载mysql-connector-java-*.jar并复制到sqoop-1.2.0-CDH3B4/lib中。

2、修改SQOOP的文件configure-sqoop,注释掉hbase和zookeeper检查(除非你准备使用HABASE等HADOOP上的组件)

#if [ ! -d "${HBASE_HOME}" ]; then

# echo “Error: $HBASE_HOME does not exist!”

# echo ‘Please set $HBASE_HOME to the root of your HBase installation.’

# exit 1

#fi

#if [ ! -d "${ZOOKEEPER_HOME}" ]; then

# echo “Error: $ZOOKEEPER_HOME does not exist!”

# echo ‘Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.’

# exit 1

#fi

3、启动HADOOP,配置好相关环境变量(例如$HADOOP_HOME),就可以使用SQOOP了

下面是个从数据库导出表的数据到HDFS上文件的例子

[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ bin/sqoop import --connect jdbc:mysql://XXXX:XX/crm --username crm --password 123456 --table company -m 1

11/09/21 15:45:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

11/09/21 15:45:26 INFO tool.CodeGenTool: Beginning code generation

11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1

11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1

11/09/21 15:45:26 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..

11/09/21 15:45:26 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar

11/09/21 15:45:26 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/./company.java

11/09/21 15:45:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.jar

11/09/21 15:45:26 WARN manager.MySQLManager: It looks like you are importing from mysql.

11/09/21 15:45:26 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

11/09/21 15:45:26 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

11/09/21 15:45:26 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

11/09/21 15:45:26 INFO mapreduce.ImportJobBase: Beginning import of company

11/09/21 15:45:27 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1

11/09/21 15:45:28 INFO mapred.JobClient: Running job: job_201109211521_0001

11/09/21 15:45:29 INFO mapred.JobClient:  map 0% reduce 0%

11/09/21 15:45:40 INFO mapred.JobClient:  map 100% reduce 0%

11/09/21 15:45:42 INFO mapred.JobClient: Job complete: job_201109211521_0001

11/09/21 15:45:42 INFO mapred.JobClient: Counters: 5

11/09/21 15:45:42 INFO mapred.JobClient:   Job Counters

11/09/21 15:45:42 INFO mapred.JobClient:     Launched map tasks=1

11/09/21 15:45:42 INFO mapred.JobClient:   FileSystemCounters

11/09/21 15:45:42 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=44

11/09/21 15:45:42 INFO mapred.JobClient:   Map-Reduce Framework

11/09/21 15:45:42 INFO mapred.JobClient:     Map input records=8

11/09/21 15:45:42 INFO mapred.JobClient:     Spilled Records=0

11/09/21 15:45:42 INFO mapred.JobClient:     Map output records=8

11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Transferred 44 bytes in 15.0061 seconds (2.9321 bytes/sec)

11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Retrieved 8 records.

查看一下数据[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ hadoop fs -cat /user/wanghai01/company/part-m-00000

1,xx

2,eee

1,xx

2,eee

1,xx

2,eee

1,xx

2,eee

到数据库中查一下验证一下

mysql> select * from company;

+------+------+

| id   | name |

+------+------+

|    1 | xx   |

|    2 | eee  |

|    1 | xx   |

|    2 | eee  |

|    1 | xx   |

|    2 | eee  |

|    1 | xx   |

|    2 | eee  |

+------+------+

8 rows in set (0.00 sec)

OK,是没有问题的。仔细看执行命令时打出的信息,会发现一个ERROR,这是因为之前我执行过此命令失败了,而再次执行的时候相关的临时数据没有清理。0b1331709591d260c1c78e86d0c51c18.png

最后

以上就是怡然橘子为你收集整理的把sqoop加入Linux的环境中,SQOOP的安装配置的全部内容,希望文章能够帮你解决把sqoop加入Linux的环境中,SQOOP的安装配置所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(60)

评论列表共有 0 条评论

立即
投稿
返回
顶部