概述
业务需求:
将当天产生的数据写入Hive分区表中(以日期作为分区)
业务分析:
利用MapReduce将数据写入Hive表实则上就是将数据写入至Hive表的HDFS目录下,但是问题在于写入至当天的分区,因此问题转换为:如何事先创建Hive表的当天分区
解决方案:
1. 创建Hive表
# 先创建分区表rcmd_valid_path
hive -e "set mapred.job.queue.name=pms;
drop table if exists pms.test_rcmd_valid_path;
create table if not exists pms.test_rcmd_valid_path
(
track_id string,
track_time string,
session_id string,
gu_id string,
end_user_id string,
page_category_id bigint,
algorithm_id int,
is_add_cart int,
rcmd_product_id bigint,
product_id bigint,
path_id string,
path_type string,
path_length int,
path_list string,
order_code string,
groupon_id bigint
)
partitioned by (ds string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't'
LINES TERMINATED BY 'n';"
2. 创建表的date当天分区(若分区不存在则创建)# 创建正式表rcmd_valid_path表date当天的分区目录
hive -e "set mapred.job.queue.name=pms;
insert overwrite table pms.test_rcmd_valid_path partition(ds='$date')
select track_id,
track_time,
session_id,
gu_id,
end_user_id,
page_category_id,
algorithm_id,
is_add_cart,
rcmd_product_id,
product_id,
path_id,
path_type,
path_length,
path_list,
order_code,
groupon_id
from pms.test_rcmd_valid_path where ds = '$date';"
3. Job直接写入即可(留意job2OutputPath)hadoop jar lib/bigdata-datamining-1.1-user-trace-jar-with-dependencies.jar com.yhd.datamining.data.usertrack.offline.job.mapred.TrackPathJob
--similarBrandPath /user/pms/recsys/algorithm/schedule/warehouse/relation/brand/$yesterday
--similarCategoryPath /user/pms/recsys/algorithm/schedule/warehouse/relation/category/$yesterday
--mcSiteCategoryPath /user/hive/warehouse/mc_site_category
--extractPreprocess /user/hive/warehouse/test_extract_preprocess
--engineMatchRule /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday
--artificialMatchRule /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday
--category /user/hive/warehouse/category
--keywordCategoryTopN 3
--termCategory /user/hive/pms/temp_term_category
--extractGrouponInfo /user/hive/pms/extract_groupon_info
--extractProductSerial /user/hive/pms/product_serial_id
--job1OutputPath /user/pms/workspace/ouyangyewei/testUsertrack/job1Output
--job2OutputPath /user/hive/pms/test_rcmd_valid_path/ds=$date
方法二(使用load data):
# 执行提取有效路径代码
# NOTE:相似类目和相似品牌的数据只有最近两到三天的。
hadoop jar $userTrackLibHome/bigdata-datamining-$userTrackVersion-jar-with-dependencies.jar com.yhd.datamining.data.usertrack.offline.job.mapred.TrackPathJob
--similarBrandPath /user/pms/recsys/algorithm/schedule/warehouse/relation/brand/$yesterday
--similarCategoryPath /user/pms/recsys/algorithm/schedule/warehouse/relation/category/$yesterday
--mcSiteCategoryPath /user/hive/warehouse/mc_site_category
--extractPreprocess /user/hive/warehouse/extract_preprocess
--engineMatchRule /user/pms/recsys/algorithm/schedule/warehouse/mix/artificial/product/$yesterday
--artificialMatchRule /user/pms/recsys/algorithm/schedule/warehouse/ruleengine/artificial/product/$yesterday
--category /user/hive/warehouse/category
--keywordCategoryTopN 3
--termCategory /user/hive/pms/temp_term_category
--extractGrouponInfo /user/hive/pms/extract_groupon_info
--extractProductSerial /user/hive/pms/product_serial_id
--job1OutputPath /user/pms/workspace/chenwu/usertrack/job1OutputPath
--job2OutputPath /user/hive/pms/rcmd_valid_path_test/ds=$date
# 将Job1的交叉销售数据load到交叉销售有效路径表的分区中
hive -e "load data inpath '/user/pms/workspace/chenwu/usertrack/job1OutputPath/crossSale' overwrite into table pms.cross_sale_path partition(ds='$date');"
最后
以上就是成就耳机为你收集整理的[Hive]MapReduce将数据写入Hive分区表的全部内容,希望文章能够帮你解决[Hive]MapReduce将数据写入Hive分区表所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复