01 线性回归

基本要素

举例通过房屋面积和房龄预测房屋价格。

模型

\[\mathrm{price} = w_{\mathrm{area}} \cdot \mathrm{area} + w_{\mathrm{age}} \cdot \mathrm{age} + b\]

数据集

在机器学习术语里，数据集被称为训练数据集（training data set）或训练集（training set），一栋房屋被称为一个样本（sample），其真实售出价格叫作标签（label），用来预测标签的因素叫作特征（feature）。特征用来表征样本的特点。

损失函数

\(l^{(i)}(\mathbf{w}, b) = \frac{1}{2} \left(\hat{y}^{(i)} - y^{(i)}\right)^2,\) 一个样本的误差, 其中常数1/2使对平⽅项求导后的常数系数为1

\(L(\mathbf{w}, b) =\frac{1}{n}\sum_{i=1}^n l^{(i)}(\mathbf{w}, b) =\frac{1}{n} \sum_{i=1}^n \frac{1}{2}\left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right)^2.\) 一个数据集中的均方误

优化函数

解析解：能够用公式表达最优解

数值解：没有表达公式，只能通过多次迭代计算最优的数值，最小化损失函数

梯度下降

记损失函数为 \(J(\theta)=\frac{1}{2} \sum_{i=1}^{m}\left(h_{\theta}(x)-y\right)^{2}\)，其中\(h_{\theta}(x)=\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2} \ldots+\theta_{n} x_{n}\)

\(\theta\) 的更新过程

\(\begin{aligned} \theta_{j}:=\theta_{j}-& \alpha \frac{\partial}{\partial \theta_{j}} J(\theta) \\ \frac{\partial}{\partial \theta_{j}} J(\theta)=& \frac{\partial}{\partial \theta_{j}} \frac{1}{2}\left(h_{\theta}(x)-y\right)^{2} \\ &=2 \cdot \frac{1}{2}\left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(h_{\theta}(x)-y\right) \\ &=\left(h_{\theta}(x)-y\right) \cdot \frac{\partial}{\partial \theta_{j}}\left(\sum_{i=0}^{n} \theta_{i} x_{i}-y\right) \\ &=\left(h_{\theta}(x)-y\right) x_{j} \end{aligned}\)

batch gradient descent 每次更新使用全部的样本，计算量大，但容易得到全局最优解，对于最优化问题，凸问题肯定可以达到全局最优

mini-batch gradient descent 每次更新使用b个样本，计算量较小，比SGD更容易得到全局最优解

stochastic gradient descent 每次更新只是用一个样本，计算量小，在所有极小值中选取最小的作为最优解

基本要素

模型

数据集

损失函数

优化函数

梯度下降

猜你喜欢