版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/dpengwang/article/details/82826007
提交spark程序后报如下错误
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 90 tasks (1025.7 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1490)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1478)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1477)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1477)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:826)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:826)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:826)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1715)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1670)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1659)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:651)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1943)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1956)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1969)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1983)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:293)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:78)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:75)
at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:94)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:74)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:74)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
查阅资料后,找到如下解决办法
在代码中修改
val sparkConf = new SparkConf()
sparkConf.set("spark.driver.maxResultSize", "4g")
val spark =SparkSession.builder().config(sparkConf).getOrCreate()
在sparkconf 中设定maxResultSize为较大的值即可
在脚本中修改
理论上在脚本中也可以配置的如下是我的脚本,最后一行是我添加的配置,但是运行的时候无法生效,知道怎么回事的麻烦评论区留言
$SPARK_HOME/bin/spark-submit \
--cluster zjyprc-hadoop \
--conf spark.yarn.job.owners=wangdaopeng \
--class xiaomi.stage1.getres \
--master yarn \
--deploy-mode cluster \
--queue "$QUEUE" \
--driver-memory 14g \
--executor-memory 4g \
--num-executors 100 \
--executor-cores 1 \
--conf spark.shuffle.io.preferDirectBufs=false \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.shuffle.service.enabled=true \
--conf spark.dynamicAllocation.maxExecutors=600 \
--conf spark.dynamicAllocation.minExecutors=1 \
--conf spark.dynamicAllocation.executorIdleTimeout=600s \
--conf spark.yarn.executor.memoryOverhead=2048 \
--conf spark.executor.extraJavaOptions="-XX:MaxPermSize=512m" \
--conf spark.network.timeout=300 \
"$JAR_FILE" \
--userinfo $User_INPUT_PATH3 \
--input_threshold $THRESHOLD \
--appusedaily $User_INPUT_PATH \
--appexpansion $User_INPUT_PATH2 \
--output $User_OUTPUT_PATH \
--hashS 100 \
--hashKey 100 \
--CateThread 0.7 \
--spark.driver.maxResultSize=3g \