我是靠谱客的博主 自由康乃馨,这篇文章主要介绍012-01Spark On YARN 环境搭建,现在分享给大家,希望可以做个参考。

1、Scala 安装

http://www.scala-lang.org/files/archive/scala-2.10.4.tgz

tar -zxvf scala-2.10.4.tgz -C app/
cd  app
ln -s scala-2.10.4 scala



2、Spark 安装
tar -zxvf spark-1.4.0-bin-hadoop2.6.tgz -C app
ln -s spark-1.4.0-bin-hadoop2.6 spark


# vim spark-env.sh
export JAVA_HOME=/home/hadoop/app/jdk1.7.0_76
export SCALA_HOME=/home/hadoop/app/scala
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.0

## worker节点的主机名列表
# vim slaves
192.168.2.20
192.168.2.33
# mv log4j.properties.template log4j.properties

## 在Master节点上执行
 cd  $SPARK_HOME/bin
./start-all.sh

3、配置系统环境变量

vim /etc/profile
export SCALA_HOME=/home/hadoop/app/scala
export SPARK_HOME=/home/hadoop/app/spark
export PATH=$PATH:$HIVE_HOME/bin:$HBASE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin


source /etc/profile

4、相关测试
## 监控页面URL
http://192.168.2.20:8080/

## 先切换到“cd $SPARK_HOME目录

(1)、本地模式
#进行spark-shell命令
./spark-shell 
#测试
sc.textFile("/home/hadoop/wc.txt").flatMap( line=>line.split("t") ).map( word=>(word,1) ).reduceByKey(_ + _).collect
#验证
http://192.168.2.20:4040/

(2)、 基于YARN模式

cd $SPARK_HOME
bin/spark-submit  --class  org.apache.spark.examples.SparkPi
--master yarn-cluster
--num-executors 3
--driver-memory 1g
--executor-memory 1g
--executor-cores 1
lib/spark-examples*.jar  10



执行步骤出现的日志
[hadoop@mycluster spark]$ bin/spark-submit  --class  org.apache.spark.examples.SparkPi
> --master yarn-cluster
> --num-executors 3
> --driver-memory 1g
> --executor-memory 1g
> -executor-cores 1
> lib/spark-examples*.jar  10
Error: Unrecognized option '-executor-cores'.
Run with --help for usage help or --verbose for debug output
[hadoop@mycluster spark]$ bin/spark-submit  --class  org.apache.spark.examples.SparkPi
> --master yarn-cluster
> --num-executors 3
> --driver-memory 1g
> --executor-memory 1g
> --executor-cores 1
> lib/spark-examples*.jar  10
15/08/30 22:53:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/30 22:53:29 INFO RMProxy:  Connecting to ResourceManager at mycluster/192.168.2.20:8032
15/08/30 22:53:29 INFO Client:  Requesting a new application from cluster with 1 NodeManagers
15/08/30 22:53:29 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
15/08/30 22:53:29 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
15/08/30 22:53:29 INFO Client: Setting up container launch context for our AM
15/08/30 22:53:29 INFO Client: Preparing resources for our AM container
15/08/30 22:53:30 INFO Client: Uploading resource file:/home/hadoop/app/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar -> hdfs://mycluster:9000/user/hadoop/.sparkStaging/application_1440995865051_0005/spark-assembly-1.4.0-hadoop2.6.0.jar
15/08/30 22:53:33 INFO Client: Uploading resource file:/home/hadoop/app/spark-1.4.0-bin-hadoop2.6/lib/spark-examples-1.4.0-hadoop2.6.0.jar -> hdfs://mycluster:9000/user/hadoop/.sparkStaging/application_1440995865051_0005/spark-examples-1.4.0-hadoop2.6.0.jar
15/08/30 22:53:39 INFO Client: Uploading resource file:/tmp/spark-ecb5f2dc-f66b-42e6-a8ae-befce75074c0/__hadoop_conf__846873578807129658.zip -> hdfs://mycluster:9000/user/hadoop/.sparkStaging/application_1440995865051_0005/__hadoop_conf__846873578807129658.zip
15/08/30 22:53:40 INFO Client: Setting up the launch environment for our AM container
15/08/30 22:53:40 INFO SecurityManager: Changing view acls to: hadoop
15/08/30 22:53:40 INFO SecurityManager: Changing modify acls to: hadoop
15/08/30 22:53:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions: Set(hadoop)
15/08/30 22:53:40 INFO Client: Submitting application 5 to ResourceManager
15/08/30 22:53:40 INFO YarnClientImpl: Submitted application application_1440995865051_0005
15/08/30 22:53:41 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:41 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: default
         start time: 1441000420286
         final status: UNDEFINED
         tracking URL: http://mycluster:8088/proxy/application_1440995865051_0005/
         user: hadoop
15/08/30 22:53:43 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:45 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:46 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:48 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:50 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:52 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:54 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:56 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:57 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:58 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:53:59 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:00 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:01 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:02 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:03 INFO Client: Application report for application_1440995865051_0005 (state: ACCEPTED)
15/08/30 22:54:04 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:04 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.2.20
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1441000420286
         final status: UNDEFINED
         tracking URL: http://mycluster:8088/proxy/application_1440995865051_0005/
         user: hadoop
15/08/30 22:54:05 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:06 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:07 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:08 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:09 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:10 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:11 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:12 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:13 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:15 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:17 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:18 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:19 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:20 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:21 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:23 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:24 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:25 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:26 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:27 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:29 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:30 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:31 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:33 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:34 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:36 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:37 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:38 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:40 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:41 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:42 INFO Client: Application report for application_1440995865051_0005 (state: RUNNING)
15/08/30 22:54:43 INFO Client: Application report for application_1440995865051_0005 (state: FINISHED)
15/08/30 22:54:43 INFO Client:
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: 192.168.2.20
         ApplicationMaster RPC port: 0
         queue: default
         start time: 1441000420286
         final status: SUCCEEDED
         tracking URL: http://mycluster:8088/proxy/application_1440995865051_0005/A
         user: hadoop
15/08/30 22:54:43 INFO Utils: Shutdown hook called
15/08/30 22:54:43 INFO Utils: Deleting directory /tmp/spark-ecb5f2dc-f66b-42e6-a8ae-befce75074c0


常见问题:
基于YARN模式下执行上述spark-submit,出现下面的错误
[hadoop@mycluster spark]$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi   --master yarn-cluster   --master yarn-cluster 10
Exception in thread "main" java.lang.Exception: When running with master 'yarn-cluster' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
        at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:239)
        at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:216)
        at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:103)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/08/30 22:25:45 INFO Utils: Shutdown hook called

解决方案: 配置 HADOOP_CONF_DIR or YARN_CONF_DIR 和变量,如下
cd $SPARK_HOME/conf
vi spark-env.sh 
# Options read in YARN client mode
# - HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files
HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.0/etc/hadoop


最后

以上就是自由康乃馨最近收集整理的关于012-01Spark On YARN 环境搭建的全部内容,更多相关012-01Spark内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(129)

评论列表共有 0 条评论

立即
投稿
返回
顶部