spark入门,提交一个 SimpleApp jar包到linux上的spark

spark中文文档没出来前,学习英文很痛苦,分享下自己的学习,好了,上内容,搞完继续去看传说中的english!

一、前提条件

linux 安装 scala   和 spark

scala

二、创建一个maven项目

SimpleApp.java

主要注意的是文件路径,这里用的是spark 自带的样例数据 README.md

package com.sparker;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;

public class SimpleApp {
	
	
  @SuppressWarnings({ "serial", "resource" })
public static void main(String[] args) {
    String logFile = "/home/linuxUser/spark-1.2.0-bin-hadoop2.4/bin/README.md"; // Should be some file on your system
    SparkConf conf = new SparkConf().setAppName("Simple Application");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> logData = sc.textFile(logFile).cache();

    long numAs = logData.filter(new Function<String, Boolean>() {
      public Boolean call(String s) { return s.contains("a"); }
    }).count();

    long numBs = logData.filter(new Function<String, Boolean>() {
      

	public Boolean call(String s) { return s.contains("b"); }
    }).count();

    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
  }
}

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>com.sparker</groupId>
	<artifactId>spark-test</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>SimpleApp</name>
	<description>一个简单的Spark</description>

	<dependencies>
		<dependency> <!-- Spark dependency -->
			<groupId>org.apache.spark</groupId>
			<artifactId>spark-core_2.10</artifactId>
			<version>1.2.0</version>
			<scope>provided</scope>
		</dependency>
	</dependencies>
	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-jar-plugin</artifactId>
				<version>2.4</version>
				<configuration>
					<archive>
						<manifest>
							<mainClass>com.sparker.SimpleApp</mainClass>
						</manifest>
					</archive>
				</configuration>
			</plugin>
		</plugins>
	</build>
</project>

四、打jar包

在项目的根目录下 进入DOS   

F:\workspaceTest\spark-test>mvn clean package

会在target文件下生产一个jar包


五、提交jar包运行

将jar拷入到 spark 的bin目录下(当然你可以考到其他的路径下,只不过在这个目录下运行比较方便)

首先启动spark 。 如果装了图形界面,可以直接打开spark-shell,选择 run in Terminal

其次提交jar包。在bin目录下打开Terminal ,执行 $ ./spark-submit --class com.sparker.SimpleApp --master local[2] spark-test-0.0.1-SNAPSHOT.jar 

提交的命令参考官方文档 http://spark.apache.org/docs/latest/submitting-applications.html

六、秀下结果







猜你喜欢

转载自blog.csdn.net/lvsehuoyan/article/details/43016799