在下载下来的spark里,有个样例程序叫做JavaSparkPi,大意是利用Spark的MapReduce函数求圆周率.
代码如下:
/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.spark.examples; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.Function2; import java.util.ArrayList; import java.util.List; /** * Computes an approximation to pi * Usage: JavaSparkPi [slices] */ public final class JavaSparkPi { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi"); sparkConf.setMaster("local"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); int slices = (args.length == 1) ? Integer.parseInt(args[0]) : 2; int n = 3000000 * slices; List<Integer> l = new ArrayList<Integer>(n); for (int i = 0; i < n; i++) { l.add(i); } JavaRDD<Integer> dataSet = jsc.parallelize(l, slices); int count = dataSet.map(new Function<Integer, Integer>() { @Override public Integer call(Integer integer) { double x = Math.random() * 2 - 1; double y = Math.random() * 2 - 1; return (x * x + y * y < 1) ? 1 : 0; } }).reduce(new Function2<Integer, Integer, Integer>() { @Override public Integer call(Integer integer, Integer integer2) { return integer + integer2; } }); System.out.println("Pi is roughly " + 4.0 * count / n); jsc.stop(); } }
代码一开始构造了一个很大的集合.然后利用Map函数迭代,并随机采样坐标点.
实现背景几何解剖大致如下
取圆心x,y正负1区间为正方形,那么正方形面积为4.
取半径为1圆,圆心坐标为0,0.那么圆形面积为3.141........,也就是元周率.
代码开始随机采样坐标点,并判断坐标点是否在圆内.
double x = Math.random() * 2 - 1; double y = Math.random() * 2 - 1; return (x * x + y * y < 1) ? 1 : 0;
随机构造X,Y,Math.random只会返回小于1的数,所以后面的乘以2减去1,必然是在正方形内.
x*x+y*y=1反映的是坐标是否在圆周上.那么<1自然就是判断是否在圆内部了.
假设在圆内,就返回1,否则返回0,结合后面的reduce就可以得到总共有多少个点是在圆内的.
已知合计n个采样点,共count个在圆内的点.
那么count/n就可以得出 采样点在圆内的合计数 所在 总共采样点个数的比例.利用这个比例去乘以正方形面积.就可以得到元周率近似值.
结论,当采样数越大,得出的圆周率越精确.