转载说明出处
Spark支持多种运行模式:
分布式部署:运行在Cluster集群中,底层资源调度可以使用Mesos或者Hadoop YARN,也可以使用Spark自带的Standalone模式
伪分布式部署
本地模式运行
为了入门方便并且考虑到个人学习成本(笔记本资源有限!!!),本篇介绍如何在local模式下运行Spark,接下来我们走起!!!
NO.1资源准备
1、 VMware10.0.1 build-1379776(我从网上下的,教程问度娘或者谷老师)
2、 CentOS6.5(给个地址,上面的资源还是蛮全的)
3、 JDK1.7(CentOS自带的是OPENJDK,大家懂得,还是用正规军的好)
4、 spark-0.9.0-incubating-bin-hadoop2(我采用的是0.9.0版本---当时的最高版本,现在已经是1.0了,建议,如果是研究的话,可以下最新版本的,放地址)
NO.2 环境搭建
1、 安装VMware
2、 安装CentOS—建议采用桥接模式,方便省心,后期做个ftp、ssh都是方便的不得了
3、 安装JDK1.7,安装完后务必记得配置下环境变量,否则还是使用自带的OPENJDK
4、 将spark-0.9.0-incubating-bin-hadoop2.tgz上传到我们的linux环境下
NO.3 SPARK框架环境
1、 解压:tar –xvf spark-0.9.0-incubating-bin-hadoop2.tgz
2、 跳转到SPARK的home目录:cd spark-0.9.0-incubating-bin-hadoop2
3、 执行sbt命令 :./sbt/sbt assembly (网速好[我家10M光纤]的话,大概半个小时完成)
4、 修改hosts文件,例如:vi /etc/hosts 加上 192.168.1.53 CentOS
5、 OK执行完成以上命令,我们的SPARK就可以在本地运行了
NO.4 环境验证
1、 进入~/ spark-0.9.0-incubating-bin-hadoop2/bin目录
36 [root@CentOS bin]# ./spark-shell 37 14/06/08 06:27:47 INFO HttpServer: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 38 14/06/08 06:27:47 INFO HttpServer: Starting HTTP Server 39 Welcome to 40 ____ __ 41 / __/__ ___ _____/ /__ 42 _\ \/ _ \/ _ `/ __/ '_/ 43 /___/ .__/\_,_/_/ /_/\_\ version 0.9.0 44 /_/ 45 46 Using Scala version 2.10.3 (Java HotSpot(TM) Client VM, Java 1.7.0_51) 47 Type in expressions to have them evaluated. 48 Type :help for more information. 49 14/06/08 06:27:51 INFO Slf4jLogger: Slf4jLogger started 50 14/06/08 06:27:51 INFO Remoting: Starting remoting 51 14/06/08 06:27:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@CentOS:38659] 52 14/06/08 06:27:51 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@CentOS:38659] 53 14/06/08 06:27:51 INFO SparkEnv: Registering BlockManagerMaster 54 14/06/08 06:27:51 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140608062751-301e 55 14/06/08 06:27:51 INFO MemoryStore: MemoryStore started with capacity 297.0 MB. 56 14/06/08 06:27:51 INFO ConnectionManager: Bound socket to port 55885 with id = ConnectionManagerId(CentOS,55885) 57 14/06/08 06:27:51 INFO BlockManagerMaster: Trying to register BlockManager 58 14/06/08 06:27:51 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager CentOS:55885 with 297.0 MB RAM 59 14/06/08 06:27:51 INFO BlockManagerMaster: Registered BlockManager 60 14/06/08 06:27:51 INFO HttpServer: Starting HTTP Server 61 14/06/08 06:27:51 INFO HttpBroadcast: Broadcast server started at http://192.168.1.53:47324 62 14/06/08 06:27:51 INFO SparkEnv: Registering MapOutputTracker 63 14/06/08 06:27:51 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d4a4b013-6a2c-4bb2-b3e6-f680cec875e7 64 14/06/08 06:27:51 INFO HttpServer: Starting HTTP Server 65 14/06/08 06:27:52 INFO SparkUI: Started Spark Web UI at http://CentOS:4040 66 14/06/08 06:27:53 INFO Executor: Using REPL class URI: http://192.168.1.53:38442 67 14/06/08 06:27:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 68 Created spark context.. 69 Spark context available as sc. 70 71 scala> println("hello,World!!") 72 hello,World!!
NO.5 DEMO验证
1 [root@CentOS bin]# ./run-example org.apache.spark.examples.SparkLR local[2] 2 SLF4J: Class path contains multiple SLF4J bindings. 3 SLF4J: Found binding in [jar:file:/root/spark-0.9.0-incubating-bin-hadoop2/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar!/org/slf4j/impl/StaticLoggerBinder.class] 4 .......................省略................... 5 4883 [spark-akka.actor.default-dispatcher-4] INFO org.apache.spark.scheduler.DAGScheduler - Completed ResultTask(4, 0) 6 4883 [spark-akka.actor.default-dispatcher-4] INFO org.apache.spark.scheduler.DAGScheduler - Stage 4 (reduce at SparkLR.scala:64) finished in 0.075 s 7 4884 [main] INFO org.apache.spark.SparkContext - Job finished: reduce at SparkLR.scala:64, took 0.098657134 s 8 Final w: (5816.075967498865, 5222.008066011391, 5754.751978607454, 3853.1772062206846, 5593.565827145932, 5282.387874201054, 3662.9216051953435, 4890.78210340607, 4223.371512250292, 5767.368579668863) 9 [root@CentOS bin]#