一、先贴Word Count的程序;这里用的是java版本
public final class JavaWordCount { private static final Pattern SPACE = Pattern.compile(" "); public static void main(String[] args) throws Exception { // if (args.length < 1) { // System.err.println("Usage: JavaWordCount <file>"); // System.exit(1); // } String filePath = "/test.txt"; SparkSession spark = SparkSession .builder() .appName("JavaWordCount") .getOrCreate(); JavaRDD<String> lines = spark.read().textFile(filePath).javaRDD(); JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() { @Override public Iterator<String> call(String s) { return Arrays.asList(SPACE.split(s)).iterator(); } }); JavaPairRDD<String, Integer> ones = words.mapToPair( new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String s) { return new Tuple2<>(s, 1); } }); JavaPairRDD<String, Integer> counts = ones.reduceByKey( new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer i1, Integer i2) { return i1 + i2; } }); List<Tuple2<String, Integer>> output = counts.collect(); counts.saveAsTextFile("/testResult"); for (Tuple2<?,?> tuple : output) { System.out.println(tuple._1() + ": " + tuple._2()); } spark.stop(); } }
二、创建需要输出的jar包
1、选择 File >> Artifacts >> +(加号) >> Jar >> From modules with dependencies
选择Main Class 点 OK摁钮 进入当前jar包的配置菜单如图所示:
在Output Layout选择的jar包中,删除Extracted 相关jar包引用只留下“wordCount” compile output
扫描二维码关注公众号,回复:
328865 查看本文章
点击Apply 和 OK按钮保存
点击Build >> Build Artrifact >> Build
在对应的output输出文件夹下面就应该找到对应的jar包文件了
三、拷贝到Spark的服务器上,进行运行测试
./spark-submit --class com.mm.JavaWordCount --master spark://localhost:7077 /usr/spark/spark-2.0.0-bin-hadoop2.6/wordCount.jar