spark集群模式调试以及远程配置

最近学习spark,在本地模式跑完程序,想再去集群上面测试,但是发现一直报下面错误:

java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
        at CF$$anonfun$3.apply(CF.scala:33)
        at CF$$anonfun$3.apply(CF.scala:24)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
18/08/28 23:48:39 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
        at CF$$anonfun$3.apply(CF.scala:33)
        at CF$$anonfun$3.apply(CF.scala:24)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

经过大神的提醒,终于找到问题,发现是scala和spark版本不对应,本人scala版本2.11.4,没有换spark之前版本是1.6.3,后续查看官网的说明,spark1.6.3不适用于2.11.x版本,于是把spark版本替换到2.2.0版本,重新在集群上提交代码,没有发现任何错误。这个版本还是需要注意一下

另外,我们把spark打好包上传到集群上后,需要用spark-submit方式提交到集群,或者是用run脚本执行,其实原理都是一样的,只不过是封装了一下而已。

IDEA编程spark,打包远程上传集群,进行集群模式调试

接下来说下spark在win或mac宿主机上打包远程上传

一、配置远程主机,找到idea的tools下面的deployment,然后选择configuration

接下来点击+号,选择sftp,输入远程连接的名称,这里我输入master

接下来配置主机和上传包的一级路径,Root Path 是一级路径,就是说,我们上传包以后,这个包会放在哪里,其实这里面还有个二级路径,我一般只设置一级路径设置到自己想要的地方,就不设置二级路径

二级路径我默认设置为 /

扫描二维码关注公众号,回复: 3010718 查看本文章

最后我们点击apply 然后ok,这里既然打包的话,就不用自动上传了,手动上传吧。下面会有介绍,我们然后打开上传的小控制台,如下,点击后会在最右侧出现下面的第二张

最后我们看下怎么手动上传:

接下来说下打包:我们配置pom的时候没有配置build,会发现类打不进去,我们按照下面操作,把依赖加上, 重新编译打包即可

这里选择自己的类

然后直接ok,我一般是右上角有个build,点击一下,然后maven install

或者可以:

最后,我贴一下我自己的spark本地idea用maven构建工程的pom文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.scalaTest</groupId>
    <artifactId>scalaTest</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <spark.version>2.2.0</spark.version>
        <scala.version>2.11</scala.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>

    </dependencies>

    <build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>CF</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
        </plugin>
    </plugins>
    </build>
</project>

猜你喜欢

转载自blog.csdn.net/Jameslvt/article/details/82180649