我是靠谱客的博主 甜蜜睫毛,最近开发中收集的这篇文章主要介绍Spark启动,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

1:Master启动

命令:

./sbin/start-master.sh

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ ./sbin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-jifeng-org.apache.spark.deploy.master.Master-1-feng03.out
查看启动日志

[jifeng@feng03 logs]$ cat spark-jifeng-org.apache.spark.deploy.master.Master-1-feng03.out 
Spark Command: /home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../conf/:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/jifeng/hadoop-2.6.0/etc/hadoop/ -Xms512m -Xmx512m -XX:MaxPermSize=128m org.apache.spark.deploy.master.Master --ip feng03 --port 7077 --webui-port 8080
========================================
15/07/11 22:12:29 INFO master.Master: Registered signal handlers for [TERM, HUP, INT]
15/07/11 22:12:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/11 22:12:30 INFO spark.SecurityManager: Changing view acls to: jifeng
15/07/11 22:12:30 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/07/11 22:12:30 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 22:12:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/11 22:12:30 INFO Remoting: Starting remoting
15/07/11 22:12:31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkMaster@feng03:7077]
15/07/11 22:12:31 INFO util.Utils: Successfully started service 'sparkMaster' on port 7077.
15/07/11 22:12:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 22:12:31 INFO server.AbstractConnector: Started SelectChannelConnector@feng03:6066
15/07/11 22:12:31 INFO util.Utils: Successfully started service on port 6066.
15/07/11 22:12:31 INFO rest.StandaloneRestServer: Started REST server for submitting applications on port 6066
15/07/11 22:12:31 INFO master.Master: Starting Spark master at spark://feng03:7077
15/07/11 22:12:31 INFO master.Master: Running Spark version 1.4.0
15/07/11 22:12:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 22:12:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:8080
15/07/11 22:12:31 INFO util.Utils: Successfully started service 'MasterUI' on port 8080.
15/07/11 22:12:31 INFO ui.MasterWebUI: Started MasterWebUI at http://192.168.0.110:8080
15/07/11 22:12:32 INFO master.Master: I have been elected leader! New state: ALIVE
15/07/11 22:13:43 INFO master.Master: Registering worker 192.168.0.110:35655 with 1 cores, 2.0 GB RAM


2:slave启动

命令:

./sbin/start-slave.sh <worker#> <master-spark-URL>

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ sbin/start-slaves.sh spark://feng03:7077
feng03: starting org.apache.spark.deploy.worker.Worker, logging to /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../logs/spark-jifeng-org.apache.spark.deploy.worker.Worker-1-feng03.out
查看启动日志


[jifeng@feng03 logs]$ cat spark-jifeng-org.apache.spark.deploy.worker.Worker-1-feng03.out 
Spark Command: /home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-bin-hadoop2.6/sbin/../conf/:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar -Xms512m -Xmx512m -XX:MaxPermSize=128m org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://feng03:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/11 22:13:39 INFO Worker: Registered signal handlers for [TERM, HUP, INT]
15/07/11 22:13:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/11 22:13:41 INFO SecurityManager: Changing view acls to: jifeng
15/07/11 22:13:41 INFO SecurityManager: Changing modify acls to: jifeng
15/07/11 22:13:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 22:13:41 INFO Slf4jLogger: Slf4jLogger started
15/07/11 22:13:41 INFO Remoting: Starting remoting
15/07/11 22:13:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@192.168.0.110:35655]
15/07/11 22:13:41 INFO Utils: Successfully started service 'sparkWorker' on port 35655.
15/07/11 22:13:42 INFO Worker: Starting Spark worker 192.168.0.110:35655 with 1 cores, 2.0 GB RAM
15/07/11 22:13:42 INFO Worker: Running Spark version 1.4.0
15/07/11 22:13:42 INFO Worker: Spark home: /home/jifeng/spark-1.4.0-bin-hadoop2.6
15/07/11 22:13:42 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
15/07/11 22:13:42 INFO WorkerWebUI: Started WorkerWebUI at http://192.168.0.110:8081
15/07/11 22:13:42 INFO Worker: Connecting to master akka.tcp://sparkMaster@feng03:7077/user/Master...
15/07/11 22:13:43 INFO Worker: Successfully registered with master spark://feng03:7077

3:Shell启动

./bin/spark-shell --master spark://IP:PORT

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ ./bin/spark-shell master=spark://feng03:7077
15/07/11 23:09:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/11 23:09:26 INFO spark.SecurityManager: Changing view acls to: jifeng
15/07/11 23:09:26 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/07/11 23:09:26 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 23:09:26 INFO spark.HttpServer: Starting HTTP Server
15/07/11 23:09:27 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 23:09:27 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:34613
15/07/11 23:09:27 INFO util.Utils: Successfully started service 'HTTP class server' on port 34613.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 1.4.0
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_79)
Type in expressions to have them evaluated.
Type :help for more information.
15/07/11 23:09:36 INFO spark.SparkContext: Running Spark version 1.4.0
15/07/11 23:09:36 INFO spark.SecurityManager: Changing view acls to: jifeng
15/07/11 23:09:36 INFO spark.SecurityManager: Changing modify acls to: jifeng
15/07/11 23:09:36 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/07/11 23:09:37 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/11 23:09:37 INFO Remoting: Starting remoting
15/07/11 23:09:37 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.0.110:34690]
15/07/11 23:09:37 INFO util.Utils: Successfully started service 'sparkDriver' on port 34690.
15/07/11 23:09:37 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/11 23:09:37 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/11 23:09:37 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-d8ab9b2d-bf0e-498a-9c7a-93fe904611e0/blockmgr-0531d884-7f97-46a0-8533-3b8c1abee2ee
15/07/11 23:09:37 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
15/07/11 23:09:38 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-d8ab9b2d-bf0e-498a-9c7a-93fe904611e0/httpd-e41d1cc4-870d-4882-9b66-8dbbc79645a3
15/07/11 23:09:38 INFO spark.HttpServer: Starting HTTP Server
15/07/11 23:09:38 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 23:09:38 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:45995
15/07/11 23:09:38 INFO util.Utils: Successfully started service 'HTTP file server' on port 45995.
15/07/11 23:09:38 INFO spark.SparkEnv: Registering OutputCommitCoordinator
15/07/11 23:09:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
15/07/11 23:09:40 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/07/11 23:09:40 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
15/07/11 23:09:40 INFO ui.SparkUI: Started SparkUI at http://192.168.0.110:4040
15/07/11 23:09:40 INFO executor.Executor: Starting executor ID driver on host localhost
15/07/11 23:09:40 INFO executor.Executor: Using REPL class URI: http://192.168.0.110:34613
15/07/11 23:09:40 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41799.
15/07/11 23:09:40 INFO netty.NettyBlockTransferService: Server created on 41799
15/07/11 23:09:40 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/11 23:09:40 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:41799 with 267.3 MB RAM, BlockManagerId(driver, localhost, 41799)
15/07/11 23:09:40 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/11 23:09:41 INFO repl.SparkILoop: Created spark context..
Spark context available as sc.
15/07/11 23:09:42 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
15/07/11 23:09:42 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/07/11 23:09:43 INFO metastore.ObjectStore: ObjectStore, initialize called
15/07/11 23:09:43 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
15/07/11 23:09:43 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/07/11 23:09:44 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/07/11 23:09:44 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/07/11 23:09:48 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/07/11 23:09:49 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
15/07/11 23:09:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:50 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:53 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:53 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/11 23:09:54 INFO metastore.ObjectStore: Initialized ObjectStore
15/07/11 23:09:54 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/07/11 23:09:55 INFO metastore.HiveMetaStore: Added admin role in metastore
15/07/11 23:09:55 INFO metastore.HiveMetaStore: Added public role in metastore
15/07/11 23:09:56 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
15/07/11 23:09:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/07/11 23:09:56 INFO repl.SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.

scala> 

经过修改spark-class

查看下启动的命令是:

[jifeng@feng03 spark-1.4.0-bin-hadoop2.6]$ ./bin/spark-shell master=spark://feng03:7077
/home/jifeng/jdk1.7.0_79/bin/java -cp /home/jifeng/spark-1.4.0-bin-hadoop2.6/conf/:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/jifeng/spark-1.4.0-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/jifeng/hadoop-2.6.0/etc/hadoop/ -Dscala.usejavacp=true -Xms512m -Xmx512m -XX:MaxPermSize=128m org.apache.spark.deploy.SparkSubmit --class org.apache.spark.repl.Main spark-shell master=spark://feng03:7077

4:测试WordCount

读取文件

val textFile = sc.textFile("file:///home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md")

textFile.count()
统计行数

scala> val textFile = sc.textFile("file:///home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md")
15/07/11 23:29:21 INFO storage.MemoryStore: ensureFreeSpace(233640) called with curMem=109214, maxMem=280248975
15/07/11 23:29:21 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 228.2 KB, free 266.9 MB)
15/07/11 23:29:21 INFO storage.MemoryStore: ensureFreeSpace(20038) called with curMem=342854, maxMem=280248975
15/07/11 23:29:21 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 19.6 KB, free 266.9 MB)
15/07/11 23:29:21 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:41799 (size: 19.6 KB, free: 267.2 MB)
15/07/11 23:29:21 INFO spark.SparkContext: Created broadcast 1 from textFile at <console>:21
textFile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[3] at textFile at <console>:21

scala> textFile.count()
15/07/11 23:29:24 INFO mapred.FileInputFormat: Total input paths to process : 1
15/07/11 23:29:24 INFO spark.SparkContext: Starting job: count at <console>:24
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:24) with 1 output partitions (allowLocal=false)
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Final stage: ResultStage 0(count at <console>:24)
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Parents of final stage: List()
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Missing parents: List()
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at textFile at <console>:21), which has no missing parents
15/07/11 23:29:24 INFO storage.MemoryStore: ensureFreeSpace(3008) called with curMem=362892, maxMem=280248975
15/07/11 23:29:24 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 266.9 MB)
15/07/11 23:29:24 INFO storage.MemoryStore: ensureFreeSpace(1791) called with curMem=365900, maxMem=280248975
15/07/11 23:29:24 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1791.0 B, free 266.9 MB)
15/07/11 23:29:24 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:41799 (size: 1791.0 B, free: 267.2 MB)
15/07/11 23:29:24 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/07/11 23:29:24 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at textFile at <console>:21)
15/07/11 23:29:24 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/07/11 23:29:24 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1426 bytes)
15/07/11 23:29:24 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
15/07/11 23:29:24 INFO rdd.HadoopRDD: Input split: file:/home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md:0+3624
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/07/11 23:29:24 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/07/11 23:29:25 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1830 bytes result sent to driver
15/07/11 23:29:25 INFO scheduler.DAGScheduler: ResultStage 0 (count at <console>:24) finished in 0.308 s
15/07/11 23:29:25 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 263 ms on localhost (1/1)
15/07/11 23:29:25 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/07/11 23:29:25 INFO scheduler.DAGScheduler: Job 0 finished: count at <console>:24, took 0.651881 s
res2: Long = 98

val count=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
统计单词数量
count.collect()
collect命令提交并执行job

scala> val count=textFile.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[6] at reduceByKey at <console>:23

scala> count.collect()
15/07/11 23:37:42 INFO spark.SparkContext: Starting job: collect at <console>:26
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Registering RDD 5 (map at <console>:23)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Got job 1 (collect at <console>:26) with 1 output partitions (allowLocal=false)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 2(collect at <console>:26)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at map at <console>:23), which has no missing parents
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(4136) called with curMem=362892, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 4.0 KB, free 266.9 MB)
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(2311) called with curMem=367028, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.3 KB, free 266.9 MB)
15/07/11 23:37:42 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:41799 (size: 2.3 KB, free: 267.2 MB)
15/07/11 23:37:42 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:874
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at map at <console>:23)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1415 bytes)
15/07/11 23:37:42 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
15/07/11 23:37:42 INFO rdd.HadoopRDD: Input split: file:/home/jifeng/spark-1.4.0-bin-hadoop2.6/README.md:0+3624
15/07/11 23:37:42 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 2056 bytes result sent to driver
15/07/11 23:37:42 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (map at <console>:23) finished in 0.465 s
15/07/11 23:37:42 INFO scheduler.DAGScheduler: looking for newly runnable stages
15/07/11 23:37:42 INFO scheduler.DAGScheduler: running: Set()
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 468 ms on localhost (1/1)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/07/11 23:37:42 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
15/07/11 23:37:42 INFO scheduler.DAGScheduler: failed: Set()
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Missing parents for ResultStage 2: List()
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at reduceByKey at <console>:23), which is now runnable
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(2288) called with curMem=369339, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 2.2 KB, free 266.9 MB)
15/07/11 23:37:42 INFO storage.MemoryStore: ensureFreeSpace(1377) called with curMem=371627, maxMem=280248975
15/07/11 23:37:42 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 1377.0 B, free 266.9 MB)
15/07/11 23:37:42 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:41799 (size: 1377.0 B, free: 267.2 MB)
15/07/11 23:37:42 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:874
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (ShuffledRDD[6] at reduceByKey at <console>:23)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1165 bytes)
15/07/11 23:37:42 INFO executor.Executor: Running task 0.0 in stage 2.0 (TID 2)
15/07/11 23:37:42 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/07/11 23:37:42 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms
15/07/11 23:37:42 INFO executor.Executor: Finished task 0.0 in stage 2.0 (TID 2). 7258 bytes result sent to driver
15/07/11 23:37:42 INFO scheduler.DAGScheduler: ResultStage 2 (collect at <console>:26) finished in 0.260 s
15/07/11 23:37:42 INFO scheduler.DAGScheduler: Job 1 finished: collect at <console>:26, took 0.815686 s
15/07/11 23:37:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 263 ms on localhost (1/1)
15/07/11 23:37:42 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
res3: Array[(String, Int)] = Array((package,1), (For,2), (Programs,1), (processing.,1), (Because,1), (The,1), (cluster.,1), (its,1), ([run,1), (APIs,1), (have,1), (Try,1), (computation,1), (through,1), (several,1), (This,2), ("yarn-cluster",1), (graph,1), (Hive,2), (storage,1), (["Specifying,1), (To,2), (page](http://spark.apache.org/documentation.html),1), (Once,1), (application,1), (prefer,1), (SparkPi,2), (engine,1), (version,1), (file,1), (documentation,,1), (processing,,2), (the,21), (are,1), (systems.,1), (params,1), (not,1), (different,1), (refer,2), (Interactive,2), (given.,1), (if,4), (build,3), (when,1), (be,2), (Tests,1), (Apache,1), (all,1), (./bin/run-example,2), (programs,,1), (including,3), (Spark.,1), (package.,1), (1000).count(),1), (Versions,1), (HDFS,1), (Data.,1), (>...
scala> 


保存计算结果
 count.saveAsTextFile("README.md")

scala> count.saveAsTextFile("README.md")
15/07/11 23:43:21 INFO spark.SparkContext: Starting job: saveAsTextFile at <console>:26
15/07/11 23:43:21 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 143 bytes
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Got job 2 (saveAsTextFile at <console>:26) with 1 output partitions (allowLocal=false)
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Final stage: ResultStage 4(saveAsTextFile at <console>:26)
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Missing parents: List()
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[7] at saveAsTextFile at <console>:26), which has no missing parents
15/07/11 23:43:21 INFO storage.MemoryStore: ensureFreeSpace(127984) called with curMem=362892, maxMem=280248975
15/07/11 23:43:21 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 125.0 KB, free 266.8 MB)
15/07/11 23:43:21 INFO storage.MemoryStore: ensureFreeSpace(43257) called with curMem=490876, maxMem=280248975
15/07/11 23:43:21 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 42.2 KB, free 266.8 MB)
15/07/11 23:43:21 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:41799 (size: 42.2 KB, free: 267.2 MB)
15/07/11 23:43:21 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:874
15/07/11 23:43:21 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[7] at saveAsTextFile at <console>:26)
15/07/11 23:43:21 INFO scheduler.TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
15/07/11 23:43:21 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 3, localhost, PROCESS_LOCAL, 1165 bytes)
15/07/11 23:43:21 INFO executor.Executor: Running task 0.0 in stage 4.0 (TID 3)
15/07/11 23:43:22 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/07/11 23:43:22 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 4 ms
15/07/11 23:43:23 INFO output.FileOutputCommitter: Saved output of task 'attempt_201507112343_0004_m_000000_3' to hdfs://feng01:9000/user/jifeng/README.md/_temporary/0/task_201507112343_0004_m_000000
15/07/11 23:43:23 INFO mapred.SparkHadoopMapRedUtil: attempt_201507112343_0004_m_000000_3: Committed
15/07/11 23:43:23 INFO executor.Executor: Finished task 0.0 in stage 4.0 (TID 3). 1828 bytes result sent to driver
15/07/11 23:43:23 INFO scheduler.DAGScheduler: ResultStage 4 (saveAsTextFile at <console>:26) finished in 1.990 s
15/07/11 23:43:23 INFO scheduler.DAGScheduler: Job 2 finished: saveAsTextFile at <console>:26, took 2.230414 s
15/07/11 23:43:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 3) in 1992 ms on localhost (1/1)
15/07/11 23:43:23 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
看到信息,不带file://的话,默认保存到hdfs中


最后

以上就是甜蜜睫毛为你收集整理的Spark启动的全部内容,希望文章能够帮你解决Spark启动所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(56)

评论列表共有 0 条评论

立即
投稿
返回
顶部