经典的美国气象数据统计每年最高温spark集群scala命令实现

287 阅读 0 评论 190 点赞

我是靠谱客的博主优秀鸵鸟，这篇文章主要介绍经典的美国气象数据统计每年最高温spark集群scala命令实现，现在分享给大家，希望可以做个参考。

步骤一：读取hdfs上存储的气象数据

val rddall = sc.textFile("hdfs://hadoop01:9000/ncdc/197*/*")
rddall: org.apache.spark.rdd.RDD[String] = hdfs://hadoop01:9000/ncdc/* MapPartitionsRDD[93] at textFile at <console>:24

步骤二：rdd命令获取map，记录各年份不等于9999的气温，保存

scala> val result = map(x=>(x.substring(15,19),{if((x.substring(92,93)).matches("[01459]")){if (x.substring(87,88)=="+"){if(x.substring(88,92)!="9999"){x.substring(88,92)}else{("")}}else {x.substring(87,92)}}else{(" ")}}))
result: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[94] at map at <console>:26

步骤三：对结果进行reduceByKey，获取最高气温

scala> val resultAll = result.reduceByKey((x,y)=>({if(x>y) x else y})).collect

当然，这三步骤也可以合三为一

val rddall = sc.textFile("hdfs://hadoop01:9000/ncdc/197*/*").map(x=>(x.substring(15,19),{if((x.substring(92,93)).matches("[01459]")){if (x.substring(87,88)=="+"){if(x.substring(88,92)!="9999"){x.substring(88,92)}else{("")}}else {x.substring(87,92)}}else{(" ")}})).reduceByKey((x,y)=>({if(x>y) x else y})).collect

还有一种办法是先filter过滤，然后再map

第一步：从hdfs读取文件

val rddall = sc.textFile("hdfs://hadoop01:9000/ncdc/197*/*")

第二步：使用filter，过滤掉不符合条件的行
val rdd1=rddall.filter(line => ({if(line.substring(87,88)=='+'){line.substring(88,92).toInt }else{ line.substring(87,92).toInt }} != 9999) && (line.substring(92,93).matches("[01459]")))

第三步: 使用map取值
val rdd2 = rdd1.map{x=>{( x.substring(15,19),if ((x.substring(87,88)=="+") && (x.substring(88,92)!="9999")){x.substring(88,92).toInt}else{x.substring(87,92).toInt})}}

第四步：使用reduceByKey对数据进行处理
rdd2.reduceByKey((x,y)=> if(x>y) x else y).collect

根据实测数据，两种实现执行时间基本相当，第二种从逻辑上更容易理解，第一种把过滤实现在了map内，各有优缺。

附录：气象数据格式：

0228010010999992018010100004+70933-008667FM-12+000999999V0200501N012012200019N015000199-00161-00601100251ADDAA106000131AY181021AY231021GA1041+012501081GE19MSL +99999+99999GF104991041999012501999999MA1999999100131MD1310151+9999MW1021OD139902501999REMSYN07601001 11665 40512 11016 21060 30013 40025 53015 69911 70283 84800 333 91125=