Spark系列(五)IDEA编写及调试Spark的WordCount程序

使用IDEA编写Spark程序的前提条件是IDEA已经配置好Scala的编写环境,可以参考Scala–IDEA配置及maven项目创建

在这里,我们以hadoop的wordcount为例,编写Scala程序,以本地模式Yarn模式分别测试程序。Spark程序在开发的时候,使用IDEA编写程序及调试过程如下:

一、项目创建

1、创建Scala的Maven项目,pom.xml文件如下所示:

    <properties>
        <log4j.version>1.2.17</log4j.version>
        <slf4j.version>1.7.22</slf4j.version>
        <spark.version>2.1.1</spark.version>
        <scala.version>2.11.8</scala.version>
    </properties>

    <dependencies>
        <!-- Logging -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>jcl-over-slf4j</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>${slf4j.version}</version>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>${log4j.version}</version>
        </dependency>
        <!-- Logging End -->

        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>${spark.version}</version>
            <!--<scope>provided</scope>-->
        </dependency>
    </dependencies>

    <build>
        <finalName>wordcount</finalName>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                         <mainClass>com.m.jd.WordCount</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>

        <pluginManagement>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-assembly-plugin</artifactId>
                    <version>3.0.0</version>
                    <executions>
                        <execution>
                            <id>make-assembly</id>
                            <phase>package</phase>
                            <goals>
                                <goal>single</goal>
                            </goals>
                        </execution>
                    </executions>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

二、编写WordCount程序

2.1、在scala的目录下编写WordCount

注意:scala目录必须被标记为Soucrce

例如我的WordCount在com.m.jd目录下,完整代码如下:

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object WordCount extends App {

  private val sparkConf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordCount")

  private val sc = new SparkContext(sparkConf)

  private val dataFile: RDD[String] = sc.textFile("hdfs://hadoop0:9000/README.md")

  private val words: RDD[String] = dataFile.flatMap(_.split(" "))

  private val word2Count: RDD[(String, Int)] = words.map((_,1))

  private val result: RDD[(String, Int)] = word2Count.reduceByKey(_+_)

  result.saveAsTextFile("hdfs://hadoop0:9000/out")

  //关闭和Spark的连接
  sc.stop()

}

上述代码大家对比下Spark-Shell里的命令,就简单明了。

sc.textFile("hdfs://hadoop0:9000/README.md").flatMap(_.split(" ")) .map((_,1)).reduceByKey(_+_).saveAsTextFile("hdfs://hadoop0:9000/out")

三 本地模式调试

注意下面这行代码,代码里已经指明了–master为local[*] 本地模式,如代码里没指明,需在其它地方指明,比如Spark-submit的命令行里的–master

private val sparkConf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("wordCount")

调试准备:
1)开启Hadoop集群
2 上传需要WordCount的文件到HDFS上,比如我上传的是Spark安装包里的README.md到HDFS的/目录下

调试:
直接在WordCount里右键run或者debug都行,在这里先直接run,看控制台是否有错误日志,有错解决错误,没错误,去HDFS上查看结果

$ hadoop fs -cat /out/*

可能出现的问题:
1)访问HDFS的时候权限问题, 在IDEA里配置HADOOP_USER_NAME=root,运行程序的时候,就会以root用户来运行。


四、Yarn模式调试

在IDEA里将WordCount项目package打成Jar包,将Jar包上传到Hadoop集群的主机上,比如我的是hadoop0为主机,hadoop1和hadoop2为从机,运行如下命令:

/opt/module/spark-2.1.1-bin-hadoop2.7/bin/spark-submit \
--class com.m.jd.WordCount \
--master yarn \
--deploy-mode client \
/opt/spark-jar/wordcount.jar \

/opt/spark-jar/wordcount.jar是我上传的jar路径,
–class com.m.jd.WordCount 是我这个wordcount.jar里的运行主类

运行上述命令后,如没报错,查看输出结果

$ hadoop fs -cat /out/*

可能出现的问题:

1)hdfs上的/out目录已经存在,之前在本地模式时,已经成功输出了结果,所以/out目录已经存在,在YARN模式调试时,可删除/out目录

$ hadoop fs -rm -r /out

2)/directory目录不存在,这是因为我的镜像里之前搭建Spark集群的时候配置了History Service服务,spark-default.conf里面配置了spark.eventLog.dir

spark.eventLog.dir      hdfs://hadoop0:9000/directory

所以假如HDFS上没有/directory存在,先创建该目录

$ hadoop fs -mkdir /directory

以上就是完整的在IDEA里编写Spark的WordCount程序,并在Spark的本地模式和Yarn模式下测试运行的例子。

猜你喜欢

转载自blog.csdn.net/u012834750/article/details/81016433