org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition

在这里插入图片描述


问题描述:

Spark 中将两个RDD使用 zip 函数进行拉链时出现如下问题 org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition

在这里插入图片描述

D:\DevelopTools\jdk\bin\java.exe "-javaagent:D:\DevelopTools\IntelliJ IDEA 2022.3.3\lib\idea_rt.jar=7057:D:\DevelopTools\IntelliJ IDEA 2022.3.3\bin" -Dfile.encoding=UTF-8 -classpath D:\DevelopTools\jdk\jre\lib\charsets.jar;D:\DevelopTools\jdk\jre\lib\deploy.jar;D:\DevelopTools\jdk\jre\lib\ext\access-bridge-64.jar;D:\DevelopTools\jdk\jre\lib\ext\cldrdata.jar;D:\DevelopTools\jdk\jre\lib\ext\dnsns.jar;D:\DevelopTools\jdk\jre\lib\ext\jaccess.jar;D:\DevelopTools\jdk\jre\lib\ext\jfxrt.jar;D:\DevelopTools\jdk\jre\lib\ext\localedata.jar;D:\DevelopTools\jdk\jre\lib\ext\nashorn.jar;D:\DevelopTools\jdk\jre\lib\ext\sunec.jar;D:\DevelopTools\jdk\jre\lib\ext\sunjce_provider.jar;D:\DevelopTools\jdk\jre\lib\ext\sunmscapi.jar;D:\DevelopTools\jdk\jre\lib\ext\sunpkcs11.jar;D:\DevelopTools\jdk\jre\lib\ext\zipfs.jar;D:\DevelopTools\jdk\jre\lib\javaws.jar;D:\DevelopTools\jdk\jre\lib\jce.jar;D:\DevelopTools\jdk\jre\lib\jfr.jar;D:\DevelopTools\jdk\jre\lib\jfxswt.jar;D:\DevelopTools\jdk\jre\lib\jsse.jar;D:\DevelopTools\jdk\jre\lib\management-agent.jar;D:\DevelopTools\jdk\jre\lib\plugin.jar;D:\DevelopTools\jdk\jre\lib\resources.jar;D:\DevelopTools\jdk\jre\lib\rt.jar;E:\Code\IDEA\BigData\Spark\target\classes;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-core_2.12\3.0.0\spark-core_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\thoughtworks\paranamer\paranamer\2.8\paranamer-2.8.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\avro\avro\1.8.2\avro-1.8.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\commons\commons-compress\1.8.1\commons-compress-1.8.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\tukaani\xz\1.5\xz-1.5.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\avro\avro-mapred\1.8.2\avro-mapred-1.8.2-hadoop2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\avro\avro-ipc\1.8.2\avro-ipc-1.8.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-codec\commons-codec\1.9\commons-codec-1.9.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\twitter\chill_2.12\0.9.5\chill_2.12-0.9.5.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\objenesis\objenesis\2.5.1\objenesis-2.5.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\twitter\chill-java\0.9.5\chill-java-0.9.5.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\xbean\xbean-asm7-shaded\4.15\xbean-asm7-shaded-4.15.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-client\2.7.4\hadoop-client-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-common\2.7.4\hadoop-common-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-io\commons-io\2.4\commons-io-2.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\mortbay\jetty\jetty-sslengine\6.1.26\jetty-sslengine-6.1.26.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-auth\2.7.4\hadoop-auth-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\httpcomponents\httpclient\4.2.5\httpclient-4.2.5.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\httpcomponents\httpcore\4.2.4\httpcore-4.2.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\curator\curator-client\2.7.1\curator-client-2.7.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\htrace\htrace-core\3.1.0-incubating\htrace-core-3.1.0-incubating.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-hdfs\2.7.4\hadoop-hdfs-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-mapreduce-client-app\2.7.4\hadoop-mapreduce-client-app-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-mapreduce-client-common\2.7.4\hadoop-mapreduce-client-common-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-yarn-client\2.7.4\hadoop-yarn-client-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-yarn-server-common\2.7.4\hadoop-yarn-server-common-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.7.4\hadoop-mapreduce-client-shuffle-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-yarn-api\2.7.4\hadoop-yarn-api-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-mapreduce-client-core\2.7.4\hadoop-mapreduce-client-core-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-yarn-common\2.7.4\hadoop-yarn-common-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\codehaus\jackson\jackson-xc\1.9.13\jackson-xc-1.9.13.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.7.4\hadoop-mapreduce-client-jobclient-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\hadoop\hadoop-annotations\2.7.4\hadoop-annotations-2.7.4.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-launcher_2.12\3.0.0\spark-launcher_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-kvstore_2.12\3.0.0\spark-kvstore_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\fasterxml\jackson\core\jackson-core\2.10.0\jackson-core-2.10.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\fasterxml\jackson\core\jackson-annotations\2.10.0\jackson-annotations-2.10.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-network-common_2.12\3.0.0\spark-network-common_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-network-shuffle_2.12\3.0.0\spark-network-shuffle_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-unsafe_2.12\3.0.0\spark-unsafe_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\javax\activation\activation\1.1.1\activation-1.1.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\curator\curator-recipes\2.7.1\curator-recipes-2.7.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\curator\curator-framework\2.7.1\curator-framework-2.7.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\google\guava\guava\16.0.1\guava-16.0.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\zookeeper\zookeeper\3.4.14\zookeeper-3.4.14.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\yetus\audience-annotations\0.5.0\audience-annotations-0.5.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\commons\commons-lang3\3.9\commons-lang3-3.9.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\commons\commons-text\1.6\commons-text-1.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\slf4j\slf4j-api\1.7.30\slf4j-api-1.7.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\slf4j\jul-to-slf4j\1.7.30\jul-to-slf4j-1.7.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\slf4j\jcl-over-slf4j\1.7.30\jcl-over-slf4j-1.7.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\log4j\log4j\1.2.17\log4j-1.2.17.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\slf4j\slf4j-log4j12\1.7.30\slf4j-log4j12-1.7.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\xerial\snappy\snappy-java\1.1.7.5\snappy-java-1.1.7.5.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\lz4\lz4-java\1.7.1\lz4-java-1.7.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\github\luben\zstd-jni\1.4.4-3\zstd-jni-1.4.4-3.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\roaringbitmap\RoaringBitmap\0.7.45\RoaringBitmap-0.7.45.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\roaringbitmap\shims\0.7.45\shims-0.7.45.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\commons-net\commons-net\3.1\commons-net-3.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\scala-lang\modules\scala-xml_2.12\1.2.0\scala-xml_2.12-1.2.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\scala-lang\scala-library\2.12.10\scala-library-2.12.10.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\scala-lang\scala-reflect\2.12.10\scala-reflect-2.12.10.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\json4s\json4s-jackson_2.12\3.6.6\json4s-jackson_2.12-3.6.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\json4s\json4s-core_2.12\3.6.6\json4s-core_2.12-3.6.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\json4s\json4s-ast_2.12\3.6.6\json4s-ast_2.12-3.6.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\json4s\json4s-scalap_2.12\3.6.6\json4s-scalap_2.12-3.6.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\core\jersey-client\2.30\jersey-client-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\jakarta\ws\rs\jakarta.ws.rs-api\2.1.6\jakarta.ws.rs-api-2.1.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\hk2\external\jakarta.inject\2.6.1\jakarta.inject-2.6.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\core\jersey-common\2.30\jersey-common-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\jakarta\annotation\jakarta.annotation-api\1.3.5\jakarta.annotation-api-1.3.5.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\hk2\osgi-resource-locator\1.0.3\osgi-resource-locator-1.0.3.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\core\jersey-server\2.30\jersey-server-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\media\jersey-media-jaxb\2.30\jersey-media-jaxb-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\jakarta\validation\jakarta.validation-api\2.0.2\jakarta.validation-api-2.0.2.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\containers\jersey-container-servlet\2.30\jersey-container-servlet-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\containers\jersey-container-servlet-core\2.30\jersey-container-servlet-core-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\jersey\inject\jersey-hk2\2.30\jersey-hk2-2.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\hk2\hk2-locator\2.6.1\hk2-locator-2.6.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\hk2\external\aopalliance-repackaged\2.6.1\aopalliance-repackaged-2.6.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\hk2\hk2-api\2.6.1\hk2-api-2.6.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\glassfish\hk2\hk2-utils\2.6.1\hk2-utils-2.6.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\javassist\javassist\3.25.0-GA\javassist-3.25.0-GA.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\io\netty\netty-all\4.1.47.Final\netty-all-4.1.47.Final.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\clearspring\analytics\stream\2.9.6\stream-2.9.6.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\io\dropwizard\metrics\metrics-core\4.1.1\metrics-core-4.1.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\io\dropwizard\metrics\metrics-jvm\4.1.1\metrics-jvm-4.1.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\io\dropwizard\metrics\metrics-json\4.1.1\metrics-json-4.1.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\io\dropwizard\metrics\metrics-graphite\4.1.1\metrics-graphite-4.1.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\io\dropwizard\metrics\metrics-jmx\4.1.1\metrics-jmx-4.1.1.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\fasterxml\jackson\core\jackson-databind\2.10.0\jackson-databind-2.10.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\fasterxml\jackson\module\jackson-module-scala_2.12\2.10.0\jackson-module-scala_2.12-2.10.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\com\fasterxml\jackson\module\jackson-module-paranamer\2.10.0\jackson-module-paranamer-2.10.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\oro\oro\2.0.8\oro-2.0.8.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\net\razorvine\pyrolite\4.30\pyrolite-4.30.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\net\sf\py4j\py4j\0.10.9\py4j-0.10.9.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\spark\spark-tags_2.12\3.0.0\spark-tags_2.12-3.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\apache\commons\commons-crypto\1.0.0\commons-crypto-1.0.0.jar;D:\DevelopTools\apache-maven-3.9.0\maven-repo\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar com.guang.Test
2023-08-11 20:08:40,172{
    
    yy/MM/ddorg.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition
	at org.apache.spark.rdd.RDD$$anon$3.hasNext(RDD.scala:928)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.rdd.RDD$$anon$3.foreach(RDD.scala:924)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.rdd.RDD$$anon$3.to(RDD.scala:924)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.rdd.RDD$$anon$3.toBuffer(RDD.scala:924)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.rdd.RDD$$anon$3.toArray(RDD.scala:924)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
2023-08-11 20:08:40,181{
    
    yy/MM/ddorg.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition
	at org.apache.spark.rdd.RDD$$anon$3.hasNext(RDD.scala:928)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.rdd.RDD$$anon$3.foreach(RDD.scala:924)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.rdd.RDD$$anon$3.to(RDD.scala:924)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.rdd.RDD$$anon$3.toBuffer(RDD.scala:924)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.rdd.RDD$$anon$3.toArray(RDD.scala:924)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
2023-08-11 20:08:40,172{
    
    yy/MM/ddorg.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition
	at org.apache.spark.rdd.RDD$$anon$3.hasNext(RDD.scala:928)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.rdd.RDD$$anon$3.foreach(RDD.scala:924)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.rdd.RDD$$anon$3.to(RDD.scala:924)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.rdd.RDD$$anon$3.toBuffer(RDD.scala:924)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.rdd.RDD$$anon$3.toArray(RDD.scala:924)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
2023-08-11 20:08:40,221{
    
    yy/MM/ddException in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15, DESKTOP-3S19428, executor driver): org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition
	at org.apache.spark.rdd.RDD$$anon$3.hasNext(RDD.scala:928)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.rdd.RDD$$anon$3.foreach(RDD.scala:924)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.rdd.RDD$$anon$3.to(RDD.scala:924)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.rdd.RDD$$anon$3.toBuffer(RDD.scala:924)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.rdd.RDD$$anon$3.toArray(RDD.scala:924)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at com.guang.Test$.main(Test.scala:22)
	at com.guang.Test.main(Test.scala)
Caused by: org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition
	at org.apache.spark.rdd.RDD$$anon$3.hasNext(RDD.scala:928)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at org.apache.spark.rdd.RDD$$anon$3.foreach(RDD.scala:924)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
	at org.apache.spark.rdd.RDD$$anon$3.to(RDD.scala:924)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
	at org.apache.spark.rdd.RDD$$anon$3.toBuffer(RDD.scala:924)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
	at org.apache.spark.rdd.RDD$$anon$3.toArray(RDD.scala:924)
	at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1004)
	at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2133)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

原因分析:

这个错误是由于在尝试使用 zip 操作将元素数量不同的 RDD 进行合并时引起的。zip 操作要求两个 RDD 在每个分区中具有相同数量的元素,以便能够一一对应地进行合并。

在这里插入图片描述
从源码中我们可以看到首先遍历两个集合的迭代器,如果两个集合都有元素,那么模式匹配都为true,如果都没有元素,则匹配为false,如果不符合上述情况,则为红线情况,报出元素个数不一致的异常。

使用 zip 进行拉链前提需要两个 RDD 的分区数一致同时每个分区内的元素个数一致,否则就会出现问题。


解决方案:

对于解决方法我们需要保证我们的RDD的分区数和每个分区数的元素个数一致就好了:

val v1: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4), 2)
val v2: RDD[Int] = sc.makeRDD(List(1, 2, 3, 4), 2)

v1.zip(v2).collect().foreach(println)

>>>
(1,1)
(2,2)
(3,3)
(4,4)

猜你喜欢

转载自blog.csdn.net/m0_47256162/article/details/132239374