注:如果是本地测试的话,只能有一个worker,不然的话就会报错
scala> val rdd1=sc.textFile("/opt/spark_test_data/word.txt")
rdd1: org.apache.spark.rdd.RDD[String] = /opt/spark_test_data/word.txt MapPartitionsRDD[39] at textFile at <console>:24
scala> val rdd1=sc.textFile("hdfs:bigdata121:9000/tmp/spark/word.txt")
rdd1: org.apache.spark.rdd.RDD[String] = hdfs:bigdata121:9000/tmp/spark/word.txt MapPartitionsRDD[41] at textFile at <console>:24
scala> rdd1.collect
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs:bigdata121:9000/tmp/spark/word.txt
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:245)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$30.apply(SparkContext.scala:1014)
at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$30.apply(SparkContext.scala:1014)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:179)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:179)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
... 48 elided
Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs:bigdata121:9000/tmp/spark/word.txt
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 73 more
hdfs 地址,问题解决
scala> val rdd1 = sc.textFile("hdfs://bigdata121:9000//tmp/spark/word.txt")
rdd1: org.apache.spark.rdd.RDD[String] = hdfs://bigdata121:9000//tmp/spark/word.txt MapPartitionsRDD[43] at textFile at <console>:24
scala> rdd1.collect
res14: Array[String] = Array(Word Andy, mantou itstar, Andy ajie, globe root, zixu password, root xiaoayong, Word xiaoayong, mantou, kluter, kluter ajie, globe root, zixu password, Word, mantou, kluter Word Andy, mantou itstart, shanshan shalajiang, fengyun mengshao, kluter ajie, globe root, shanshan shalajiang, fengyun mengshao, zixu password, aurora, Word xiaoayong, mantou xiaoayong, kluter xiaoayong, kluter ajie, globe root, zixu password, Word, mantou xiaoayong, kluter, shanshan shalajiang, fengyun mengshao, hehe hanbing, jk, wenrou, yunduo, Right, xiaohe, jiangzi, wolf, liuheng, dingdingding)
最后
以上就是英勇面包最近收集整理的关于Spark rdd 操作说明hdfs 地址,问题解决的全部内容,更多相关Spark内容请搜索靠谱客的其他文章。
发表评论 取消回复