版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/SYaoJun/article/details/83274982
简介
解决回归问题,思想简单,实现容易,许多强大的非线性模型的基础,结果具有很好的可解释性,蕴含机器学习中的很多重要思想。
简单线性回归
y
(
i
)
=
a
x
(
i
)
+
b
y^{(i)}=ax^{(i)}+b
y ( i ) = a x ( i ) + b 其中
y
^
(
i
)
\hat y^{(i)}
y ^ ( i ) 为预测值
我们希望
y
(
i
)
y^{(i)}
y ( i ) 与
y
^
(
i
)
\hat y^{(i)}
y ^ ( i ) 的差距尽可能小,考虑所有样本:
∑
i
=
1
m
(
y
(
i
)
−
y
^
(
i
)
)
2
\sum\limits_{i=1}^m(y^{(i)} - \hat y^{(i)} )^2
i = 1 ∑ m ( y ( i ) − y ^ ( i ) ) 2 目标: 找到a和b,使得
∑
i
=
1
m
(
y
(
i
)
−
y
^
(
i
)
)
2
\sum\limits_{i=1}^m(y^{(i)} - \hat y^{(i)} )^2
i = 1 ∑ m ( y ( i ) − y ^ ( i ) ) 2 尽可能小
典型的最小二乘法问题:最小化误差的平方
a
=
∑
i
=
1
m
(
x
(
i
)
−
x
ˉ
)
(
y
(
i
)
−
y
ˉ
)
∑
i
=
1
m
(
x
(
i
)
−
x
ˉ
)
2
a=\frac{\sum\limits_{i=1}^{m}(x^{(i)}-\bar{x})(y^{(i)}-\bar{y})}{\sum\limits_{i=1}^{m}(x^{(i)}-\bar{x})^2}
a = i = 1 ∑ m ( x ( i ) − x ˉ ) 2 i = 1 ∑ m ( x ( i ) − x ˉ ) ( y ( i ) − y ˉ )
b
=
y
ˉ
−
a
x
ˉ
b=\bar{y} - a\bar{x}
b = y ˉ − a x ˉ
向量化运算
∑
i
=
1
m
w
(
i
)
⋅
v
(
i
)
⟹
w
⋅
v
\sum\limits_{i=1}^{m}w^{(i)}\centerdot v^{(i)} \Longrightarrow w\centerdot v
i = 1 ∑ m w ( i ) ⋅ v ( i ) ⟹ w ⋅ v
w
=
(
w
(
1
)
,
w
(
2
)
,
⋯
 
,
w
(
m
)
)
w=(w^{(1)},w^{(2)},\cdots ,w^{(m)})
w = ( w ( 1 ) , w ( 2 ) , ⋯ , w ( m ) )
v
=
(
v
(
1
)
,
v
(
2
)
,
⋯
 
,
v
(
m
)
)
v=(v^{(1)},v^{(2)},\cdots ,v^{(m)})
v = ( v ( 1 ) , v ( 2 ) , ⋯ , v ( m ) )
回归算法的评价
均方误差MSE(Mean Squared Error)
M
S
E
=
1
m
∑
i
=
1
m
(
y
t
e
s
t
(
i
)
−
y
^
t
e
s
t
(
i
)
)
2
MSE =\frac{1}{m}\sum\limits_{i=1}^m(y_{test}^{(i)} -\hat y_{test}^{(i)})^2
M S E = m 1 i = 1 ∑ m ( y t e s t ( i ) − y ^ t e s t ( i ) ) 2
均方根误差RMSE(Root Mean Squared Error)
R
M
S
E
=
1
m
∑
i
=
1
m
(
y
t
e
s
t
(
i
)
−
y
^
t
e
s
t
(
i
)
)
2
RMSE = \sqrt{\frac{1}{m}\sum\limits_{i=1}^m(y_{test}^{(i)} -\hat y_{test}^{(i)})^2}
R M S E = m 1 i = 1 ∑ m ( y t e s t ( i ) − y ^ t e s t ( i ) ) 2
平均绝对误差MAE(Mean Absolute Error)
M
A
E
=
1
m
∑
i
=
1
m
∣
y
t
e
s
t
(
i
)
−
y
^
t
e
s
t
(
i
)
∣
MAE = \frac{1}{m}\sum\limits_{i=1}^m|y_{test}^{(i)} -\hat y_{test}^{(i)}|
M A E = m 1 i = 1 ∑ m ∣ y t e s t ( i ) − y ^ t e s t ( i ) ∣
from sklearn. metrics import mean_squared_error
from sklearn. metrics import mean_absolute_error
msek = mean_squared_error( y_test, y_predict)
maek = mean_absolute_error( y_test, y_predict)
R Square
最好的衡量线性回归法的指标
R
2
=
1
−
∑
i
=
1
m
(
y
^
(
i
)
−
y
(
i
)
)
2
∑
i
=
1
m
(
y
ˉ
−
y
(
i
)
)
2
=
1
−
∑
i
=
1
m
(
y
^
(
i
)
−
y
(
i
)
)
2
/
m
∑
i
=
1
m
(
y
ˉ
−
y
(
i
)
)
2
/
m
=
1
−
M
S
E
(
y
^
,
y
)
V
a
r
(
y
)
R^2 = 1 - \frac{\sum\limits_{i=1}^m(\hat y^{(i)} - y^{(i)})^2}{\sum\limits_{i=1}^m(\bar y - y^{(i)})^2}= 1 - \frac{\sum\limits_{i=1}^m(\hat y^{(i)} - y^{(i)})^2/m}{\sum\limits_{i=1}^m(\bar y - y^{(i)})^2/m}= 1-\frac{MSE(\hat y,y)}{Var(y)}
R 2 = 1 − i = 1 ∑ m ( y ˉ − y ( i ) ) 2 i = 1 ∑ m ( y ^ ( i ) − y ( i ) ) 2 = 1 − i = 1 ∑ m ( y ˉ − y ( i ) ) 2 / m i = 1 ∑ m ( y ^ ( i ) − y ( i ) ) 2 / m = 1 − V a r ( y ) M S E ( y ^ , y ) 其中分子是使用我们的模型预测产生的错误,分母是使用
y
=
y
ˉ
y=\bar y
y = y ˉ 预测产生的错误
R
2
≤
1
R^2 \le 1
R 2 ≤ 1
R
2
R^2
R 2 越大越好,当我们的预测模型不犯任何错误时,
R
2
R^2
R 2 得到最大值1 当我们的模型等于基准模型时,
R
2
R^2
R 2 为0 如果
R
2
<
0
R^2 < 0
R 2 < 0 ,说明我们学习到的模型还不如基准模型。此时,很有可能我们的数据不存在任何线性关系。
from sklearn. metrics import r2_score
print ( r2_score( y_test, y_predict) )
多元线性回归
x
(
i
)
=
(
X
1
(
i
)
,
X
2
(
i
)
,
⋯
 
,
X
n
(
i
)
)
x^{(i)}=(X_1^{(i)},X_2^{(i)},\cdots,X_n^{(i)})
x ( i ) = ( X 1 ( i ) , X 2 ( i ) , ⋯ , X n ( i ) )
y
=
θ
0
+
θ
1
x
1
+
θ
2
x
2
+
⋯
+
θ
n
x
n
y=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n
y = θ 0 + θ 1 x 1 + θ 2 x 2 + ⋯ + θ n x n
y
^
(
i
)
=
θ
0
+
θ
1
X
1
(
i
)
+
θ
2
X
2
(
i
)
+
⋯
+
θ
n
X
n
(
i
)
\hat y^{(i)}=\theta_0+\theta_1X_1^{(i)}+\theta_2X_2^{(i)}+\cdots+\theta_nX_n^{(i)}
y ^ ( i ) = θ 0 + θ 1 X 1 ( i ) + θ 2 X 2 ( i ) + ⋯ + θ n X n ( i ) ,其中
X
0
(
i
)
=
1
X_0^{(i)}=1
X 0 ( i ) = 1
θ
=
(
θ
0
,
θ
1
,
θ
2
,
⋯
 
,
θ
n
)
T
\theta = (\theta_0,\theta_1,\theta_2,\cdots,\theta_n)^T
θ = ( θ 0 , θ 1 , θ 2 , ⋯ , θ n ) T
目标:使
∑
i
=
1
m
(
y
(
i
)
−
y
^
(
i
)
)
2
\sum\limits_{i=1}^m(y^{(i)} - \hat y^{(i)} )^2
i = 1 ∑ m ( y ( i ) − y ^ ( i ) ) 2 尽可能小
X
(
i
)
=
(
X
0
(
i
)
,
X
1
(
i
)
,
X
2
(
i
)
,
⋯
 
,
X
n
(
i
)
)
X^{(i)} =(X_0^{(i)},X_1^{(i)},X_2^{(i)},\cdots,X_n^{(i)})
X ( i ) = ( X 0 ( i ) , X 1 ( i ) , X 2 ( i ) , ⋯ , X n ( i ) )
X
b
=
(
1
X
1
(
1
)
X
2
(
1
)
…
X
n
(
1
)
1
X
1
(
2
)
X
2
(
2
)
…
X
n
(
2
)
⋯
⋯
1
X
1
(
m
)
X
2
(
m
)
…
X
n
(
m
)
)
X_b=\begin{pmatrix} 1& {X_1^{(1)}}&{X_2^{(1)}}&{\dots}&{X_n^{(1)}} \\ 1& {X_1^{(2)}}&{X_2^{(2)}}&{\dots}&{X_n^{(2)}} \\ {\cdots}&{}&{}&{}&{\cdots}\\ 1& {X_1^{(m)}}&{X_2^{(m)}}&{\dots}&{X_n^{(m)}} \\ \end{pmatrix}
X b = ⎝ ⎜ ⎜ ⎜ ⎛ 1 1 ⋯ 1 X 1 ( 1 ) X 1 ( 2 ) X 1 ( m ) X 2 ( 1 ) X 2 ( 2 ) X 2 ( m ) … … … X n ( 1 ) X n ( 2 ) ⋯ X n ( m ) ⎠ ⎟ ⎟ ⎟ ⎞
θ
=
(
θ
0
θ
1
θ
2
⋯
θ
n
)
\theta=\begin{pmatrix}\theta_0\\ \theta_1\\\theta_2\\\cdots\\\theta_n\end{pmatrix}
θ = ⎝ ⎜ ⎜ ⎜ ⎜ ⎛ θ 0 θ 1 θ 2 ⋯ θ n ⎠ ⎟ ⎟ ⎟ ⎟ ⎞
y
^
=
X
b
⋅
θ
\hat y = X_b \centerdot \theta
y ^ = X b ⋅ θ
多元线性回归的正规方程解(Normal Equation)
θ
=
(
X
b
T
X
b
)
−
1
X
b
T
y
\theta =(X_b^TX_b)^{-1}X_b^Ty
θ = ( X b T X b ) − 1 X b T y 时间复杂度高:
O
(
n
3
)
O(n^3)
O ( n 3 ) 优化过后
O
(
n
2.4
)
O(n^{2.4})
O ( n 2 . 4 ) 优点:不需要对数据做归一化处理
sklearn中的回归问题
from sklearn. linear_model import LinearRegression
lin_reg = LinearRegression( )
lin_reg. fit( X_train, y_train)
print ( lin_reg. coef_)
print ( lin_reg. intercept_)
from sklearn. neighbors import KNeighborsRegressor
knn_reg = KNeighborsRegressor( )
knn_reg. fit( X_train, y_train)
aa = knn_reg. score( X_test, y_test)
print ( aa)
线性回归算法总结
1.典型的参数学习,而KNN是非参数学习 2.只能解决回归问题,虽然很多分类方法中,线性回归是基础。而KNN既可以解决分类问题,也可以解决回归问题。