PBRT_V2 总结记录 Sampling Distribution

参考： https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-mathematical-foundations

Sampling Distribution

In statistics when some characteristic(特性) of a given population can be calculated using all the elements(元素) or items in this

population, we say that the resulting value is a parameter(参数) of the population. The population mean for example is a

population parameter which is used to define the average value of a quantity. Parameters are fixed values. On the other hand,

when the we use samples to get an estimation(估计) of a population parameter we say that the value resulting from the

samples is a statistic(统计数值).

（a population parameter 定义：使用population的所有的数据去计算的一个值，例如 population mean

statisitic 定义，如果我们用采样点去估算一个 a population parameter，那么这个估算值其实就是 statistic ）

As you can see in figure 3 the population generated by our program has an arbitrary distribution. This population is not

distributed accordining to any particular probability distribution,and espcially(尤其) not a normal distribution. The reason why

we made this choice will become clear very soon(很快就变得清晰). Because the distribution is discrete(离散) and finite(有限),

this population of course has a well defined mean and variance which we already computed above. What we are going to do

now is take a sample of size n from this population, compute its sample mean and repeat this experiment 1000 times. The

sample mean value will be rounded off(四舍五入) to the nearest integer value (so that it takes any integer value between 0 and

20). At the end of the process, we will count the number of samples whose means are either 0, or 1, or 2, ... up to 20. Figure 4

shows the results. Quite remarkably(十分明显), as you can see, the distribution of samples follows a normal distribution. This is

not the distribution of cards here that we are looking at but the distribution of samples. Be sure to understand that difference

quite clearly. It is a distribution of statistics. Note also that this is not a perfect normal distribution (you know understand why

we have been very specific(特殊) about this in the previous chapter) because clearly, there is some difference between the

results and a perfect normal distribution (curve in red). In conclusion(最后), even thouh the distribution of the population is

arbitrary, the distribution of samples or statistics is not (but it converges in distribution to the normal distribution. We will come

back to this idea later).

（图3表示的就是，标有数字【0-20】的卡片与个数的分布图，图4表示的意思就是，从这堆卡片中去抽取一张卡片，然后计算它的平均值(平均值经过四舍五入)，抽取1000次，记录1000次的平均值，结果显示，平均值的分布图像1个 normal distribution）

In other words, instead of studying for example how the height (the property) of all adults from a given country (the population)

are distributed, we take samples from this population to estimate the population's average height, and look at how these

samples are distributed with regards to(关于) each other. In statistic, the distribution of samples (or statistics) is called

a sampling distribution. Similarly to the case of a population distribution, sampling distributions can be defined using models

(i.e. probability distributions). It defines how all possible samples are distributed for a given population and samples of a

given size.

Note 2: the sampling distribution of a statistic is the distribution of that statistic(统计数值), considered as a random variable,

when derived from(来源于) a random sample of size n. In other words, the sampling distribution of the mean is a distribution of

samples means.

Extend

First off(首先), you start with a population. Then you draw elements from this population randomly. In this particular diagram in

each experiment(试验) we make what we call 3 observations(观察值), in other words we draw 3 items from the population.

Because these are random variables, but possible outcomes from the experiment we label them with the lower case x. If now

take the weighted average of these 3 drawn items, we get what we call a statistic or sample whose size is n=3. To compute the

value of this sample, we use the equation for the expected value (or mean). Each sample on its own, is a random variable, but

because now they represent the mean of certain number n of items in the population, we label them with the upper letter X. We

can repeat this experiment N times which gives as series of samples: X1,X2,...XN. This collection(收集) of samples is what we

call a sampling distribution. Because samples are random, we can also compute their mean the same way we computed the

mean of the items in the population. This is what we called the expected value (or mean) of the sampling distribution of means

and denoted And once we have this value we can compute the variance of the distribution of means

（开展一个实验，每一次采样都是采样 3个值(xi)，然后计算这3个值的mean，得到(Xi)，再计算这3个值得 variance得到 Var(Xi)，那么收集这些 Xi ，可以组成一个 Sampling Distribution, 同时我们也可以计算这个Sampling Distribution 的mean，得到，对于 Var(Xi)是同样的道理，得到的是）

We ran the program several times each time increasing the sample size by 2. The following table shows the results (keep in

mind that the population mean which we compute in the program is 8.970280):

First, the data seems to confirm(证明) the theory. Which is that as the sample size increases, the mean of all our samples approaches the population mean (which is 8.970280). Furthermore(此外), the standard deviation of the distribution of means decreases and as expected(不出所料) (you can visualize this as the curve of the normal distribution becoming

narrower(狭窄)). Thus as stated(说明) before, as n approaches inifinity, the sampling distribution turns into(变成) a perfect

normal distribution of mean μ (the population mean) and standard deviation 0: N(μ,0). We say that the random sequence of

random variables X1,...Xn, converges in distribution to a normal distribution.

（当采样个数增加的时候，越来越靠近 the population mean，对于也是一样，越来越小）

This is important, because mathematicians like to have the proof(证明) that eventually(最后) the mean of the samples and the

population mean μ are the same and that the method is thus valid (from a theoretical point of view(理论的观点) because

obviously(明显) in practice, an infinite sample size is impossible). In other words, we can write (and we also checked this result

experimentally) that:

And if you don't care so much about the mathematics and just want to understand how this applies(应用) to you (and the field

of rendering) you can just see this as "your estimation becomes better as you keep taking samples (i.e. as n increases)".

Eventually you have so many samples, that your estimation and the value of what you are trying to estimate are very close to

each other and even the same in theory when you have an inifinity of these samples. That's really all it "means".

（当采样数接近无限的时候，）