Python 14周习题 Anscombe's quartet---pandas和seaborn模块的小应用

Python 14周习题 Anscombe’s quartet

题目出自:https://nbviewer.jupyter.org/github/schmit/cme193-ipython-notebooks-lecture/blob/master/Exercises.ipynb

Part 1

题目

For each of the four datasets…

  1. Compute the mean and variance of both x and y
  2. Compute the correlation coefficient between x and y
  3. Compute the linear regression line: y=β0+β1x+ϵy=β0+β1x+ϵ (hint: use statsmodels and look at the Statsmodels notebook)

代码

  1. 计算4个数据集中xy的均值和方差
# 以取第一个数据集为例子,其他的同理
dataI = anscombe[anscombe.dataset == "I"]
print(dataI['x'].mean())
print(dataI['x'].var())
print(dataI['y'].mean())
print(dataI['y'].var())

结果:

9.0
11.0
7.500909090909093
4.127269090909091

  1. 计算xy的相关系数
print(dataI['x'].corr(dataI['y']))

结果:

0.81642051634484

  1. 计算线性回归曲线
dataI = anscombe[anscombe.dataset == "I"]
x = dataI['x']
y = dataI['y']
x = sm.add_constant(x)
results = sm.OLS(y, x).fit()
print(results.params)

结果:

const 3.000091
x 0.500091
dtype: float64

即β0=3.000091,β1=0.500091

Part2

题目

Using Seaborn, visualize all four datasets.

hint: use sns.FacetGrid combined with plt.scatter

代码

graph = sns.FacetGrid(anscombe,col='dataset')
graph.map(plt.scatter,'x','y')
plt.show()

结果

这里写图片描述

猜你喜欢

转载自blog.csdn.net/qq_39178023/article/details/80658003