线性回归（理论篇）

线性模型

线性模型（Linear Model）是机器学习中应用最广泛的模型，指通过样本特征的线性组合来进行预测的模型。给定一个n维样本 $\textbf{x} =[x_{1},x_{2},···,x_{n}]^{T}$ ，其线性组合函数为:

h_{θ} (x) = θ_{1} x_{1} + θ_{2} x_{2} + \cdot \cdot \cdot + θ_{n} x_{n} + b = (θ; b)^{T} (x; 1)

$h_{\theta}(\textbf{x})=\theta_{1}x_{1} +\theta_{2}x_{2}+···+\theta_{n}x_{n} + b = (\theta;b)^{T}(\textbf{x};1)$

线性回归

给定数据集 $D={(\textbf{x}_{1},y_{1}),(\textbf{x}_{2},y_{2}),···,(\textbf{x}_{N},y_{N})}$ ，其中 $\textbf{x}$ 是一个m维向量， $y_{i}\in\Re$ 。线性回归(linear regression)试图用一个线性模型以尽可能准确地预测实值输出值。

极大似然估计

用 $\hat{y_{i}}$ 表示第i样本的预测值，则估计误差：

ε_{i} = \hat{y_{i}} - y_{i}

$\varepsilon_{i}=\hat{y_{i}}-y_{i}$
根据中心极限定理，误差

ε_{i}

$\varepsilon_{i}$ 是独立同分布的，且符合均值为0方差为

σ^{2}

$\sigma^2$ 的高斯分布。则：

p (ε_{i}) = \frac{1}{\sqrt{2 π} σ} e x p (- \frac{ε_{i}^{2}}{2 σ^{2}})

$p(\varepsilon_{i}) = \frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{\varepsilon_{i}^{2}}{2\sigma^2}})$

p (y_{i} | x_{i}, θ) = \frac{1}{\sqrt{2 π} σ} e x p (- \frac{(y_{i} - θ^{T} x_{i})^{2}}{2 σ^{2}})

$p(y_{i}|\textbf{x}_{i},\theta)=\frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{(y_{i}-\theta^{T}\textbf{x}_{i})^2}{2\sigma^2}})$

采用极大似然估计时，似然函数为：

L (y_{1}, y_{2}, \cdot \cdot \cdot, y_{N} | x_{1}, x_{2}, \cdot \cdot \cdot, x_{N}, θ) = \prod^{N} p (y_{i} | x_{i}, θ) = \prod^{N} \frac{1}{\sqrt{2 π} σ} e x p (- \frac{(y_{i} - θ^{T} x_{i})^{2}}{2 σ^{2}})

$L(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta) = \prod^{N}p(y_{i}|\textbf{x}_{i},\theta)\\=\prod^{N}\frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{(y_{i}-\theta^{T}\textbf{x}_{i})^2}{2\sigma^2}})$

对数似然函数为：

l (y_{1}, y_{2}, \cdot \cdot \cdot, y_{N} | x_{1}, x_{2}, \cdot \cdot \cdot, x_{N}, θ) = l o g L (y_{1}, y_{2}, \cdot \cdot \cdot, y_{N} | x_{1}, x_{2}, \cdot \cdot \cdot, x_{N}, θ) = \sum^{N} l o g (\frac{1}{\sqrt{2 π} σ} e x p (- \frac{(y_{i} - θ^{T} x_{i})^{2}}{2 σ^{2}})) = N l o g \frac{1}{\sqrt{2 π} σ} - \frac{1}{σ^{2}} \frac{1}{2} \sum^{N} (y_{i} - θ^{T} x_{i})^{2}

$l(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta) = logL(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta)\\ =\sum^{N}log(\frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{(y_{i}-\theta^{T}\textbf{x}_{i})^2}{2\sigma^2}}))\\ =Nlog\frac{1}{\sqrt{2\pi}\sigma} - \frac{1}{\sigma^2}\frac{1}{2}\sum^{N}(y_{i}-\theta^{T}\textbf{x}_{i})^2$
令

J (θ) = \frac{1}{2} \sum^{N} (θ^{T} x_{i} - y_{i})^{2}

$J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2$
则求

l (y_{1}, y_{2}, \cdot \cdot \cdot, y_{N} | x_{1}, x_{2}, \cdot \cdot \cdot, x_{N}, θ)

$l(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta)$ 最大即求：

J (θ)

$J(\theta)$ 最小。

J (θ)

$J(\theta)$ 称为线性回归的目标函数。

参数解析解

对 $J(\theta)$ 求导得：

\frac{\nabla J (θ_{j})}{θ_{j}} = \sum^{N} (x_{i j}^{2} θ_{j} - x_{i j} y_{i}) = (\begin{matrix} x_{1 j} & x_{2 j} & ⋮ & x_{N j} \end{matrix}) (\begin{matrix} x_{1 j} \\ x_{2 j} \\ ⋮ \\ x_{N j} \end{matrix}) θ_{j} - (\begin{matrix} x_{1 j} & x_{2 j} & \cdot \cdot \cdot & x_{N j} \end{matrix}) (\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{N} \end{matrix})

$\frac{\nabla J(\theta_{j})}{\theta_{j}}=\sum^N(x_{ij}^2\theta_{j}-x_{ij}y_{i}) \\ =\begin{pmatrix} x_{1j} & x_{2j} & \vdots & x_{Nj} \end{pmatrix} \begin{pmatrix} x_{1j} \\ x_{2j} \\ \vdots \\ x_{Nj} \end{pmatrix} \theta_{j} - \begin{pmatrix} x_{1j} & x_{2j} & ··· & x_{Nj} \end{pmatrix} \begin{pmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \end{pmatrix}$
其中

j \in 1, 2, \cdot \cdot \cdot, M

$j\in{1,2,···,M}$ ，

x_{i j}

$x_{ij}$ 表示第i个样本的第j维。令倒数等于0,并写出矩阵形式得：

(\begin{matrix} x_{11} & x_{21} & \cdot \cdot \cdot & x_{N 1} \\ x_{12} & x_{22} & \cdot \cdot \cdot & x_{N 2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{1 N} & x_{2 N} & \cdot \cdot \cdot & x_{N N} \end{matrix}) (\begin{matrix} x_{11} & x_{12} & \cdot \cdot \cdot & x_{1 N} \\ x_{21} & x_{22} & \cdot \cdot \cdot & x_{2 N} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{N 1} & x_{N 2} & \cdot \cdot \cdot & x_{N N} \end{matrix}) (\begin{matrix} θ 1 \\ θ_{2} \\ ⋮ \\ θ_{N} \end{matrix}) - (\begin{matrix} x_{11} & x_{21} & \cdot \cdot \cdot & x_{N 1} \\ x_{12} & x_{22} & \cdot \cdot \cdot & x_{N 2} \\ ⋮ & ⋮ & ⋮ & ⋮ \\ x_{1 N} & x_{2 N} & \cdot \cdot \cdot & x_{N N} \end{matrix}) (\begin{matrix} y_{1} \\ y_{2} \\ \cdot \cdot \cdot \\ y_{N} \end{matrix}) = 0

$\begin{pmatrix} x_{11} & x_{21} & ··· & x_{N1} \\ x_{12} & x_{22} & ··· & x_{N2} \\ \vdots & \vdots & \vdots& \vdots \\ x_{1N} & x_{2N} & ··· & x_{NN} \end{pmatrix} \begin{pmatrix} x_{11} & x_{12} & ··· & x_{1N} \\ x_{21} & x_{22} & ··· & x_{2N} \\ \vdots & \vdots & \vdots& \vdots \\ x_{N1} & x_{N2} & ··· & x_{NN} \end{pmatrix} \begin{pmatrix} \theta{1} \\ \theta_{2} \\ \vdots \\ \theta_{N} \end{pmatrix} - \begin{pmatrix} x_{11} & x_{21} & ··· & x_{N1} \\ x_{12} & x_{22} & ··· & x_{N2} \\ \vdots & \vdots & \vdots& \vdots \\ x_{1N} & x_{2N} & ··· & x_{NN} \end{pmatrix} \begin{pmatrix} y_{1} \\ y_{2} \\ ··· \\ y_{N} \end{pmatrix} = \textbf{0}$
即：

X^{T} X θ - X^{T} Y = 0

$X^TX\theta-X^TY=0$
使用最小二乘法得到解析解：

θ = (X^{T} X)^{- 1} X^{T} Y

$\theta =(X^TX)^{-1}X^TY$
为了防止过拟合或者

X^{T} X

$X^TX$ 不可逆，增加

λ

$\lambda$ 扰动：

θ = (X^{T} X + λ I)^{- 1} X^{T} Y

$\theta =(X^TX+\lambda I)^{-1}X^TY$

线性回归的复杂度惩罚因子

增加L1正则的目标函数为（lasso）：
$J (θ) = \frac{1}{2} \sum^{N} (θ^{T} x_{i} - y_{i})^{2} + λ \sum^{M} | θ_{j} |$ $J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2 + \lambda\sum^M|\theta_{j}|$
通常L1正则求解出的参数是稀疏的。
增加L2正则的目标函数为(rige)：
$J (θ) = \frac{1}{2} \sum^{N} (θ^{T} x_{i} - y_{i})^{2} + λ \sum^{M} θ_{j}^{2}$ $J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2 + \lambda\sum^M\theta_{j}^2$
L1与L2正则混合的目标函数为(ElasticNet):
$J (θ) = \frac{1}{2} \sum^{N} (θ^{T} x_{i} - y_{i})^{2} + ρ \sum^{M} | θ_{j} | + (1 - ρ) \sum^{M} θ_{j}^{2}$ $J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2 + \rho\sum^M|\theta_{j}| + (1-\rho)\sum^M\theta_{j}^2$