概述
1.Hadoop集群上把mysql中的数据导入HDFS:
(1)准备工作:安装mysql数据库,并创建一个database 名为:hadooptest;
然后在hadooptest下创建table名为employees:其内容为(此数据来自hadoop基础教程一书):
Alice Engineering 50000 2009-03-12
Bob Sales 35000 2011-10-01
Camille Marketing 40000 2003-04-20
David Executive 75000 2001-03-20
Erica Support 34000 2011-07-07
(2)下载并安装sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz;注意要设置相应的环境变量:在 /etc/profile 里面;
然后下载mysql-connector-java-5.1.34-bin.jar,并把其拷到 sqoop的lib目录下:注意要设置相应的环境变量: vim /etc/profile ;
#set sqoop path
export SQOOP_HOME=/usr/sqoop
export PATH=${SQOOP_HOME}/bin:${PATH}
export CLASSPATH=${SQOOP_HOME}/lib/mysql-connector-java-5.1.34-bin.jar:${CLASSPATH}
(3)运行命令: sqoop import --connect jdbc:mysql://localhost/hadooptest --username hadoop --password 20082009 --table employees -m 1时,产生下面错误:
![](https://file2.kaopuke.com:8081/files_image/2023061017/202306101755406855863.png)
查看logs:从中发现是因为其他从机无法连接Master的数据库,所以出现错误;
针对上述问题,必须在Master主机里面以root的身份来赋予其他从机的访问权限,而且是无密码访问:
设置完后再次运行命令:把localhost变为Master 主机的IP地址,命令如下:提示主机Master.hadoop无法访问数据库:
下面修改数据库的访问权限,赋予Master.hadoop权限,密码设置为空:
![](https://file2.kaopuke.com:8081/files_image/2023061017/202306101755424772187.png)
设置完成后,再次执行命令:
sqoop import --connect jdbc:mysql://172.16.2.17/hadooptest --username hadoop --table employees -m 1
(默认情况下,sqoop会启动4个mapper读取数据,尽管数据集很小;通过-m可以指定mapper的数量)运行成功:
![](https://file2.kaopuke.com:8081/files_image/2023061017/202306101755422685379.png)
查看HDFS上的输出结果:成功把mysql的数据导入HDFS上去:
![](https://file2.kaopuke.com:8081/files_image/2023061017/202306101755422933573.png)
2.Hadoop 群集通过sqoop把mysql数据导入hive中:
(1)首先要删除之前的HDFS上的输出目录: hadoop fs -rmr em*
(2)使用sqoop执行数据导入任务:
[hadoop@Master ~]$ sqoop import --connect jdbc:mysql://172.16.2.17/hadooptest --username hadoop --table employees --hive-import --hive-table employees
Warning: /usr/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
14/12/29 10:29:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/12/29 10:29:56 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
14/12/29 10:29:56 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
14/12/29 10:29:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
14/12/29 10:29:56 INFO tool.CodeGenTool: Beginning code generation
14/12/29 10:29:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
14/12/29 10:29:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
14/12/29 10:29:56 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoop
Note: /tmp/sqoop-hadoop/compile/934eea95d6f0f4b0d86d229dbc48686c/employees.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/12/29 10:29:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/934eea95d6f0f4b0d86d229dbc48686c/employees.jar
14/12/29 10:29:57 WARN manager.MySQLManager: It looks like you are importing from mysql.
14/12/29 10:29:57 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
14/12/29 10:29:57 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
14/12/29 10:29:57 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/12/29 10:29:57 INFO mapreduce.ImportJobBase: Beginning import of employees
14/12/29 10:29:57 INFO db.DBInputFormat: Using read commited transaction isolation
14/12/29 10:29:57 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`first_name`), MAX(`first_name`) FROM `employees`
14/12/29 10:29:57 WARN db.TextSplitter: Generating splits for a textual index column.
14/12/29 10:29:57 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
14/12/29 10:29:57 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
14/12/29 10:29:57 INFO mapred.JobClient: Running job: job_201412282237_0008
14/12/29 10:29:58 INFO mapred.JobClient: map 0% reduce 0%
14/12/29 10:30:12 INFO mapred.JobClient: map 50% reduce 0%
14/12/29 10:30:15 INFO mapred.JobClient: map 100% reduce 0%
14/12/29 10:30:20 INFO mapred.JobClient: Job complete: job_201412282237_0008
14/12/29 10:30:20 INFO mapred.JobClient: Counters: 18
14/12/29 10:30:20 INFO mapred.JobClient: Map-Reduce Framework
14/12/29 10:30:20 INFO mapred.JobClient: Spilled Records=0
14/12/29 10:30:20 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2141048832
14/12/29 10:30:20 INFO mapred.JobClient: Map input records=5
14/12/29 10:30:20 INFO mapred.JobClient: SPLIT_RAW_BYTES=513
14/12/29 10:30:20 INFO mapred.JobClient: Map output records=5
14/12/29 10:30:20 INFO mapred.JobClient: Physical memory (bytes) snapshot=280956928
14/12/29 10:30:20 INFO mapred.JobClient: CPU time spent (ms)=2510
14/12/29 10:30:20 INFO mapred.JobClient: Total committed heap usage (bytes)=216793088
14/12/29 10:30:20 INFO mapred.JobClient: File Input Format Counters
14/12/29 10:30:20 INFO mapred.JobClient: Bytes Read=0
14/12/29 10:30:20 INFO mapred.JobClient: FileSystemCounters
14/12/29 10:30:20 INFO mapred.JobClient: HDFS_BYTES_READ=513
14/12/29 10:30:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=121353
14/12/29 10:30:20 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=161
14/12/29 10:30:20 INFO mapred.JobClient: Job Counters
14/12/29 10:30:20 INFO mapred.JobClient: Launched map tasks=4
14/12/29 10:30:20 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
14/12/29 10:30:20 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/12/29 10:30:20 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24231
14/12/29 10:30:20 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/12/29 10:30:20 INFO mapred.JobClient: File Output Format Counters
14/12/29 10:30:20 INFO mapred.JobClient: Bytes Written=161
14/12/29 10:30:20 INFO mapreduce.ImportJobBase: Transferred 161 bytes in 23.6772 seconds (6.7998 bytes/sec)
14/12/29 10:30:20 INFO mapreduce.ImportJobBase: Retrieved 5 records.
14/12/29 10:30:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
14/12/29 10:30:20 WARN hive.TableDefWriter: Column start_date had to be cast to a less precise type in Hive
14/12/29 10:30:20 INFO hive.HiveImport: Removing temporary files from import process: hdfs://172.16.2.17:9000/user/hadoop/employees/_logs
14/12/29 10:30:20 INFO hive.HiveImport: Loading uploaded data into Hive
14/12/29 10:30:21 INFO hive.HiveImport: Logging initialized using configuration in file:/usr/hive/hive/conf/hive-log4j.properties
14/12/29 10:30:21 INFO hive.HiveImport: Hive history file=/tmp/hadoop/hive_job_log_hadoop_201412291030_581129438.txt
14/12/29 10:30:25 INFO hive.HiveImport: OK
14/12/29 10:30:25 INFO hive.HiveImport: Time taken: 3.157 seconds
14/12/29 10:30:25 INFO hive.HiveImport: Loading data to table default.employees
14/12/29 10:30:25 INFO hive.HiveImport: OK
14/12/29 10:30:25 INFO hive.HiveImport: Time taken: 0.185 seconds
14/12/29 10:30:25 INFO hive.HiveImport: Hive import complete.
14/12/29 10:30:25 INFO hive.HiveImport: Export directory is empty, removing it.
(3)查看输出的结果:
[hadoop@Master ~]$ hive -e "select * from employees"
Logging initialized using configuration in file:/usr/hive/hive/conf/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201412291031_1993004175.txt
OK
Alice Engineering 50000 2009-03-12
Bob Sales 35000 2011-10-01
Camille Marketing 40000 2003-04-20
David Executive 75000 2001-03-20
Erica Support 34000 2011-07-07
Time taken: 2.586 seconds
[hadoop@Master ~]$ hive -e "describe employees"
Logging initialized using configuration in file:/usr/hive/hive/conf/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201412291041_685213829.txt
OK
first_name string
dept string
salary int
start_date string
Time taken: 2.297 seconds
最后
以上就是坦率日记本为你收集整理的Hadoop群集与关系数据库RDBMS之间的协同工作的全部内容,希望文章能够帮你解决Hadoop群集与关系数据库RDBMS之间的协同工作所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复