ISL-Chap1&2.1笔记

写在前面
Chap1 Introduction
Chap2 Statistical Learning

2.1 什么是 Statistical Learning

2.1.1 Why Estimate $f$
2.1.1 How Do We Estimate $f$
2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability
2.1.4 Supervised Versus Unsupervised Learning
2.1.5 Regression Versus Classiﬁcation Problems

Reference

写在前面

趁着假期重新刷一遍《An Introduction to Statistical Learning》，整理一下思路，也顺着这本书再重温一下R。小白水平，可能会有些理解上的错误，若有错误欢迎指正、讨论。
本系列笔记，内容主要参考《An Introduction to Statistical Learning》一书，若需引用请注明出处。

Chap1 Introduction

本章大略介绍了：

什么是 Statistical Learning ：
A vast set of tools for understandiing data 。这些工具可以分为 “supervised” 和 “unsupervised”，即 监督学习和非监督学习。
本书使用的数据集。
Statistical Learing 的历史。
这本书适合什么人学习。
符号的使用说明。
行文结构。

Chap2 Statistical Learning

2.1 什么是 Statistical Learning

本小节由 “如何提高销售量？” 这个问题引入。因此，我们的目标就是：建立一个可以通过三种媒体渠道(TV, radio, newspaper)的投入，精确预测销售量(sales)的模型。其中，销售量(Y-sales)是输出值，三个渠道的投入(X1-TV, X2-radio, X3-newspaper) 是输入值。 YX1,YX2,Y~x3
若使用模型 $f$ 来拟合 $Y$ ，那么一般形式可写作：
$Y=f(X)+\epsilon$
这里的 $f$ 是关于各变量( $X1,X2,...,Xn$ ) 的特定但目前还待求的函数。 $\epsilon$ 是随机误差项，它独立于 $X$ 且均值为0。
总的来说，Statistical Learning 指的就是一系列估计 $f$ 的方法。

2.1.1 Why Estimate $f$

建立模型(估计 $f$ )主要的两种原因：Prediction or Inference

Prediction:
$\hat{Y}=\hat{f}(X)$

公式中的 $\hat{f}$ 是 $f$ 的估计。通常此处的 $\hat{f}$ 被看作一个黑盒，也就是说关注点不在于 $\hat{f}$ 确切的形式，而在于能否准确的预测 $Y$ 。

In this setting, $\hat{f}$ is often treated as a black box, in the sense that one is not typically concerned with the exact form of $\hat{f}$ , provided that it yields accurate predictions for Y .

其中，是否准确，取决于两个error：“reducible error” 和 “irreducible error”。本书试图找到最小化 “reducible error” 的估计 $\hat{f}$ 。因此，在估计 $Y$ 的过程中，另一个error - irreducible error 的存在会导致一个准确性上限(upper bound)。

PS：两个 error的具体的公式和解释可以参考p19的公式 2.3.

Inference:

We are often interested in understanding the way that Y is aﬀected as X1,…,Xp change.

扫描二维码关注公众号，回复： 4735561 查看本文章

我们对 $Y$ 究竟是如何被各个变量影响感兴趣，换句话说，我们的目标不再仅限于准确的通过 $X$ 预测 $Y$ ，还想要更加精确的理解， $Y$ 作为 $X$ 的函数是如何变化的。 这里的 $\hat{f}$ 就不能被看作一个黑盒了。

以下几个问题可以更好的帮助我们了解 $Y$ 作为 $X$ 的函数是如何变化的。
（1）Which predictors are associated with the response?
（2）What is the relationship between the response and each predictor?
（3）Can the relationship between Y and each predictor be adequately summarized using a linear equation, or is the relationship more complicated?

2.1.1 How Do We Estimate $f$

Parametric Method:
完成参数化的估计方法可以分两步走：
（1）对函数形式进行假设(选择合适的函数形式)。比如，假设 $f$ 是 $X$ 的线性函数：
$f(X)=\beta_0+\beta_1X_1+\beta_2X_2+...+\beta_pX_p$
（2）用训练集来拟合(fit)或者说训练(train)这个模型。（例如:least squares）

这种提前选取模型的方法有一个缺点，就是，通常我们很难一开始就选到真正的模型 $f$ 。如果选取的模型和真实模型相比差距太大，那么最终预估的结果就会比较差。为了解决这个问题，我们可以选取更加复杂的模型进行拟合。
但是通常，更复杂的模型也意味着要拟合更多的参数，更糟糕的是复杂的模型更容易出现 Overfitting (过拟合)的情况,也就是说，复杂的模型可能会在训练集上有很好的表现，但是模型可能拟合进了errors或者noise这些并不属于 $f$ 本质的东西。这种情况会导致模型的普适性很差，换句话说，这个模型在训练集上表现非常好，但是换一个数据集就效果很差了。
Non-Parametric Method：
非参数模型就不需要提前选取/假设函数形式了。它的基本思路是估计的模型 $f$ 尽可能的接近数据点，与此同时又不要过分弯弯曲曲、凹凸不平。

Instead they seek an estimate of $f$ that gets as close to the data points as possible without being too rough or wiggly.

优点：因为不需要提前选定函数形式，非参数模型在形状上有更大的可能性。
缺点：需要大量的数据，小数据量的情况下不太好使。

PS：使用非参数模型也要注意设定一个合适的smoothness，否则也会出现过拟合的现象
非参数模型overfitting

2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability

这本书中讲的模型，大多是 less flexible 或者说 more restrictive 的模型，换句话说它们估计 $f$ 的过程中，可产生的形状范围是比较小的。比如，线性回归(linear regression), 只能产生线性的函数，形状比如线或者平面。但是有些方法就可以产生比较多种类的形状比如上面那张图。
面对这么多种模型，我们该选择哪种模型呢？通常在模型表现相近的时候，根据奥卡姆剃刀原理，我们会选择相对简单的模型。
那么，为什么我们会倾向于选择更restrictive的模型，而非更加flexible的模型呢呢？
（1）在Inference的情况下很好理解，越简单的模型，可解释性越强。为了搞清楚 $Y$ 究竟是怎么变化的，当然会倾向与选择简单的模型。
flexibility-interpretability
（2）那么是不是在prediction的情况下，我们就更倾向于选择复杂的模型了呢。其实并不是。

We will often obtain more accurate predictions using a less ﬂexible method.
这个乍一看似乎有些反本能，但是当考虑到之前overfitting的情况时，似乎就可以理解了。

2.1.4 Supervised Versus Unsupervised Learning

简单来说，区分监督非监督就看有没有label。
监督学习情景，比如分类，里面可用的模型包括比如说linear regression，logistic regression…
非监督学习的情景，比如聚类，会在Chap10讨论。
不过也存在半监督(semi)学习，比如n个样本里面有m个有response(label)，但是又n-m个没有。。。

2.1.5 Regression Versus Classiﬁcation Problems

首先区分一下quantitative 和 qualitative。大概可以说，quantitative的变量是numerical value，比如身高，收入股票价格。而qualitative的变量是class或者categories之类的，比如性别。
通常，response是quantitative 的是regression，response是qualitative的是classification。

Reference

1.《An Introduction to Statistical Learning》
2. 奥卡姆剃刀原理百度百科https://baike.baidu.com/item/%E5%A5%A5%E5%8D%A1%E5%A7%86%E5%89%83%E5%88%80%E5%8E%9F%E7%90%86

若需引用请注明出处。
若有错误欢迎指正、讨论。

ISL-Chap1&2.1笔记

ISL-Chap1&2.1笔记

写在前面

Chap1 Introduction

Chap2 Statistical Learning

2.1 什么是 Statistical Learning

2.1.1 Why Estimate f f f

2.1.1 How Do We Estimate f f f

2.1.3 The Trade-Off Between Prediction Accuracy and Model Interpretability

2.1.4 Supervised Versus Unsupervised Learning

2.1.5 Regression Versus Classiﬁcation Problems

Reference

猜你喜欢

2.1.1 Why Estimate $f$

2.1.1 How Do We Estimate $f$