概述
目前笔者趟了sqoop的坑,把mysql的数据导入hive
1. sqoop版本选取
笔者最先尝试的是sqoop1.9,1.9的sqoop分为两个部分,server端和client端,官网说还有web端,server端是一个常驻进程,负责connect,job的管理和执行,client端可以远程连接server,向 server提交请求,和获取server状态,目前的版本不支持数据导入hive,所以选取1.4.6版本,其实两个版本都是把数据导入HDFS临时文件夹再move到hive目录下,1.9需要hive手动从hdfs load
2. sqoop安装配置
直接从官网下载可运行的tar包解压到本地,配置sqoop-env.sh文件,配置如下
▽
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# included in all the hadoop scripts with source command
# should not be executable directly
# also should not be passed any arguments, since we need original $*
# Set Hadoop-specific environment variables here.
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/work/hadoop/hadoop-2.7.3
export ZOOKEEPER_HOME=/home/work/hadoop/zookeeper-3.4.9
export HIVE_CONF_DIR=/home/work/hadoop/hive-2.1.1/conf
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/work/hadoop/hadoop-2.7.3/share/hadoop/mapreduce
#set the path to where bin/hbase is available
export HBASE_HOME=/home/work/hadoop/hbase-1.2.4
#Set the path to where bin/hive is available
#export HIVE_HOME=
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/work/hadoop/zookeeper-3.4.9/conf
3. 创建表
Create table test (id int, username string,passwd string)
Row format delimited fields terminated by ‘t’
Lines terminated by ‘n’
Stored as textfile
4.数据导入
./sqoop import --connect jdbc:mysql://10.33.240.93:3308/test --username hadoop --password 7897 --query "select user_name ,user_id,identype from users where $CONDITIONS" --hive-import --hive-database test_odbc --hive-table users --split-by user_id --fields-terminated-by '