Python 14周习题 Anscombe’s quartet
Part 1
题目
For each of the four datasets…
- Compute the mean and variance of both x and y
- Compute the correlation coefficient between x and y
- Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)
代码
- 计算4个数据集中x和y的均值和方差
# 以取第一个数据集为例子,其他的同理
dataI = anscombe[anscombe.dataset == "I"]
print(dataI['x'].mean())
print(dataI['x'].var())
print(dataI['y'].mean())
print(dataI['y'].var())
结果:
9.0
11.0
7.500909090909093
4.127269090909091
- 计算x和y的相关系数
print(dataI['x'].corr(dataI['y']))
结果:
0.81642051634484
- 计算线性回归曲线
dataI = anscombe[anscombe.dataset == "I"]
x = dataI['x']
y = dataI['y']
x = sm.add_constant(x)
results = sm.OLS(y, x).fit()
print(results.params)
结果:
const 3.000091
x 0.500091
dtype: float64
即β0=3.000091,β1=0.500091
Part2
题目
Using Seaborn, visualize all four datasets.
hint: use sns.FacetGrid combined with plt.scatter
代码
graph = sns.FacetGrid(anscombe,col='dataset')
graph.map(plt.scatter,'x','y')
plt.show()