安装
下载 Spark 1.4.1
1
|
wget -c http://www.interior-dsgn.com/apache/spark/spark-1.4.1/spark-1.4.1.tgz
|
编译Spark,使用 scala 2.11
1
2
|
./dev/change-version-to-2.11.sh
mvn -Dscala-2.11 -DskipTests clean package
|
运行 spark-shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
./bin/spark-shell
15/07/23 17:18:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 1.4.1
/_/
Using Scala version 2.11.6 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala>
|
看到以上信息就代表 Spark
已经安装好了。
简单的配置
修改 $SPARK_HOME/conf/spark-env.conf
设置如下参数:
1
2
3
4
5
6
|
export JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/Home"
export SPARK_SCALA_VERSION="2.11"
export SPARK_MASTER_IP="192.168.1.102"
export SPARK_LOCAL_IP="192.168.1.102"
export SPARK_WORKER_MEMORY="2G"
export SPARK_WORKER_CORE="2"
|
因为编译的是 scala 2.11
版本,所以应在配置文件里指定 Spark
以scala 2.11进行启动。
接着就可以Standalone模式启动spark了:./sbin 大专栏 Learn Spark - 安装/start-all.sh
spark-submit
Spark
使用 spark-submit
部署执行程序, bin/spark-submit
可以轻松完成 Spark
应用程序在local
、Standalone
、YARN
和Mesos
上的快捷部署。我们提交一个最简单的 WorldCount
程序,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
package learnspark.intro
import org.apache.spark.{SparkContext, SparkConf}
object WordCount {
def main(args: Array[String]): Unit = {
println(args.length + " " + args.toList)
if (args.length < 2) {
println("run params: inputfile outputfile")
System.exit(1)
}
val inputFile = args(0)
val outputFile = args(1)
val conf = new SparkConf().setAppName("wordCount")
val sc = new SparkContext(conf)
val input = sc.textFile(inputFile)
val words = input.flatMap(_.split(' '))
val counts = words.map((_, 1)).reduceByKey { case (x, y) => x + y }
counts.saveAsTextFile(outputFile)
}
}
|
使用以下脚本提交程序到 Spark
执行:
1
2
3
4
5
6
7
8
9
|
#!/bin/sh
rm -rf /tmp/wordcount
$SPARK_HOME/bin/spark-submit
--class learnspark.intro.WordCount
--master "spark://192.168.1.102:7077"
target/scala-2.11/learn-spark_2.11-0.0.1.jar
$SPARK_HOME/README.md /tmp/wordcount
|
- –class 指定要运行的class
- –master 程序要运行的master
- target/… 程序提交的jar包
- inputAttr [outputAttr …] 程序执行参数