spark中文文档没出来前,学习英文很痛苦,分享下自己的学习,好了,上内容,搞完继续去看传说中的english!
一、前提条件
linux 安装 scala 和 spark
scala
二、创建一个maven项目
SimpleApp.java
主要注意的是文件路径,这里用的是spark 自带的样例数据 README.md
package com.sparker;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
public class SimpleApp {
@SuppressWarnings({ "serial", "resource" })
public static void main(String[] args) {
String logFile = "/home/linuxUser/spark-1.2.0-bin-hadoop2.4/bin/README.md"; // Should be some file on your system
SparkConf conf = new SparkConf().setAppName("Simple Application");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("a"); }
}).count();
long numBs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("b"); }
}).count();
System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
}
}
pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.sparker</groupId>
<artifactId>spark-test</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>SimpleApp</name>
<description>一个简单的Spark</description>
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<manifest>
<mainClass>com.sparker.SimpleApp</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
四、打jar包
在项目的根目录下 进入DOS
F:\workspaceTest\spark-test>mvn clean package
会在target文件下生产一个jar包
五、提交jar包运行
将jar拷入到 spark 的bin目录下(当然你可以考到其他的路径下,只不过在这个目录下运行比较方便)
首先启动spark 。 如果装了图形界面,可以直接打开spark-shell,选择 run in Terminal
其次提交jar包。在bin目录下打开Terminal ,执行 $ ./spark-submit --class com.sparker.SimpleApp --master local[2] spark-test-0.0.1-SNAPSHOT.jar
提交的命令参考官方文档 http://spark.apache.org/docs/latest/submitting-applications.html
六、秀下结果