我是靠谱客的博主 英勇面包,这篇文章主要介绍Spark rdd 操作说明hdfs 地址,问题解决,现在分享给大家,希望可以做个参考。

注:如果是本地测试的话,只能有一个worker,不然的话就会报错

scala> val rdd1=sc.textFile("/opt/spark_test_data/word.txt")
rdd1: org.apache.spark.rdd.RDD[String] = /opt/spark_test_data/word.txt MapPartitionsRDD[39] at textFile at <console>:24

scala> val rdd1=sc.textFile("hdfs:bigdata121:9000/tmp/spark/word.txt")
rdd1: org.apache.spark.rdd.RDD[String] = hdfs:bigdata121:9000/tmp/spark/word.txt MapPartitionsRDD[41] at textFile at <console>:24

scala> rdd1.collect
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:bigdata121:9000/tmp/spark/word.txt
  at org.apache.hadoop.fs.Path.initialize(Path.java:205)
  at org.apache.hadoop.fs.Path.<init>(Path.java:171)
  at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:245)
  at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
  at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$30.apply(SparkContext.scala:1014)
  at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$30.apply(SparkContext.scala:1014)
  at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
  at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:179)
  at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
  at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
  ... 48 elided
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs:bigdata121:9000/tmp/spark/word.txt
  at java.net.URI.checkPath(URI.java:1823)
  at java.net.URI.<init>(URI.java:745)
  at org.apache.hadoop.fs.Path.initialize(Path.java:202)
  ... 73 more
 

 

hdfs 地址,问题解决

scala> val rdd1 = sc.textFile("hdfs://bigdata121:9000//tmp/spark/word.txt")
rdd1: org.apache.spark.rdd.RDD[String] = hdfs://bigdata121:9000//tmp/spark/word.txt MapPartitionsRDD[43] at textFile at <console>:24

scala> rdd1.collect
res14: Array[String] = Array(Word Andy, mantou itstar, Andy ajie, globe root, zixu password, root xiaoayong, Word xiaoayong, mantou, kluter, kluter ajie, globe root, zixu password, Word, mantou, kluter Word Andy, mantou itstart, shanshan shalajiang, fengyun mengshao, kluter ajie, globe root, shanshan shalajiang, fengyun mengshao, zixu password, aurora, Word xiaoayong, mantou xiaoayong, kluter xiaoayong, kluter ajie, globe root, zixu password, Word, mantou xiaoayong, kluter, shanshan shalajiang, fengyun mengshao, hehe hanbing, jk, wenrou, yunduo, Right, xiaohe, jiangzi, wolf, liuheng, dingdingding)
 

 

最后

以上就是英勇面包最近收集整理的关于Spark rdd 操作说明hdfs 地址,问题解决的全部内容,更多相关Spark内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(42)

评论列表共有 0 条评论

立即
投稿
返回
顶部