10.8 Spark资源调度源码分析

Work启动之后会向Master注册

代码层面来说:注册过程就是往Master数据结构里面插入一条数据HashSet ,这时候Master里就会有 val workers = new HashSet()的代码

1,客户端client执行spark-submit 任务命令,就会向Master请求资源,用来启动Driver进程,他会将当前Driver信息注册给Master;代码层面:就是往Master里面的数据结构插入一条记录;这时候Master里面就会多出一条,val waiting = new ArrayBuffer()

2,Driver启动,初始化SparkContext,在初始化SparkContext的时候会创建两个对象DAGScheduler,TaskScheduler;TaskScheduler会向master请求资源,启动Executor;从代码层面来说,就是将当前的Application信息注册到Master的数据结构里面;意思就是Master里面就会插入val waitingApps = new ArrayBuffer()

3,源码底层会检测Matser的数据结构,数据改变就会调用schedule方法;schedule就是资源调度的方法


spark-submit --master spark://hadoop1:7077

--deploy-mode cluster --executor-cores(executor计算进程使用多少个core) 2

--executor-memory(executor计算进程使用多少内存) 1G

--total-exutor-cores(所有的Executor一共使用多少个core) 10

--driver-memory 1G

是在内存充足的情况下使用的core:10/2 = 5

结论:

(1)默认情况下,每一个Worker为当前的Application启动一个Executor,这个Executor会使用全部的core

./spark-submit --master spark://hadoop1:7077 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000

(2)在内存充足的前提下:如果想在某一台Worker上启动多个Executor需要设置--executor-cores

修改了spark-env.sh 每一台Worker 2G内存 3个core

./spark-submit --master spark://hadoop1:7077 --executor-cores 1 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000

每一个Worker能够启动Executor的公式:

execuotNums = Math.min(Worker.sumCores/coresPerExecutor,Worker.sunMemory/memoryPerExecutor)

(3)默认情况下,Executror会在集群中分散启动(有利于数据本地化),如果不想分散启动怎么办?

new SparkConf().set("spark.deploy.spreadOut","false")

./spark-submit --master spark://hadoop1:7077 --executor-cores 1 --total-executor-cores 2 --class org.apache.spark.examples.SparkPi ../lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000


Spark-submit 命令

--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

--deploy-mode DEPLOY_MODE

--class

--name

--jars 依赖的jar包路径, Driver和Executor里面的程序都可以依赖这个jar包

--files 会将指定的文件,下载到每一个Executor的工作目录区 默认是在Spark安装包下的WOrker目录

--conf --conf spark.deploy.spreadOut=false 可以代替代码里面 new SparkConf().set("spark.deploy.spreadOut","false")

--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 1024M).

--driver-java-options Extra Java options to pass to the driver.

--driver-library-path Extra library path entries to pass to the driver.

--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with --jars are automatically included in the

classpath.

--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).

--help, -h Show this help message and exit

--verbose, -v Print additional debug output

--version, Print the version of current Spark

Spark standalone with cluster deploy mode only:

--driver-cores NUM Cores for driver (Default: 1).

Spark standalone or Mesos with cluster deploy mode only:

--supervise 如果Driver在cluster模式下挂掉了 会重启

Spark standalone and Mesos only:

--total-executor-cores NUM Total cores for all executors.

Spark standalone and YARN only:

--executor-cores NUM Number of cores per executor. (Default: 1 in YARN mode,

or all available cores on the worker in standalone mode)

YARN-only:

--driver-cores NUM Number of cores used by the driver, only in cluster mode

(Default: 1).

--queue QUEUE_NAME 指定资源队列的名称

--num-executors NUM Number of executors to launch (Default: 2).

猜你喜欢

转载自blog.csdn.net/u011418530/article/details/81231007