1.keys
功能:
返回所有键值对的key
示例
| 1 2 3 4 | val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.keys.collect.foreach(println) |
结果
| 1 2 3 4 5 6 7 | hadoop spark hive spark list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[142] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[143] at map at command-3434610298353610:3 |
2.values
功能:
返回所有键值对的value
示例
| 1 2 3 4 | val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.values.collect.foreach(println) |
结果
| 1 2 3 4 5 6 7 | 1 1 1 1 list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[145] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[146] at map at command-3434610298353610:3 |
3.mapValues(func)
功能:
对键值对每个value都应用一个函数,但是,key不会发生变化。
示例
| 1 2 3 4 | val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.mapValues(_+1).collect.foreach(println)//对每个value进行+1 |
结果
| 1 2 3 4 | (hadoop,2) (spark,2) (hive,2) (spark,2) |
最后
以上就是纯情时光最近收集整理的关于【spark】常用转换操作:keys 、values和mapValues 的全部内容,更多相关【spark】常用转换操作:keys内容请搜索靠谱客的其他文章。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复