linear regression2

1 多元线性回归的矩阵解法

J ( θ ) = 1 2 M i = 1 n ( y i ( a + b x i ) ) 2 = 1 2 M ( y x θ ) T ( y x θ ) J(\theta)=\frac{1}{2M}\sum\limits_{i=1}^n(y_i-(a+bx_i))^2=\frac{1}{2M}(y-x\theta)^T(y-x\theta) ,
J ( θ ) J(\theta) 求一阶偏导得到梯度,

$
\begin{array}{lll}
\nabla_{\theta}J(\theta)&=&\nabla_{\theta}(\frac{1}{2M}(y-x\theta)^T(y-x\theta))\
&=&\nabla_\theta(\frac{1}{2M}(yT-\thetaTx^T)(y-x\theta))\
&=&\frac{1}{2M}\nabla_\theta(yTy-yTx\theta-\thetaTxTy+\thetaTxTx\theta)\
&=&\frac{1}{2M}(-(yTx)T-xTy+2xTx\theta)\
&=&\frac{1}{M}(xTx\theta-xTy)=0
\end{array}
$

解得: θ = ( x T x ) 1 x T y \theta=(x^Tx)^{-1}x^Ty ,这个结果对于多元线性回归也适用.

代码如下:

import numpy as np
import pandas as pd
data = pd.read_csv('c:/users/administrator/desktop/Advertising.csv')    # TV,Radio,Newspaper,Sales
data['intercept'] = 1
x = data[['intercept','TV', 'radio', 'newspaper']]
y = data['sales']
# 前150行的数据作为train,后50行作为test
train_x = np.array(x.loc[1:150,])
test_x = np.array(x.loc[151:,])
train_y = np.array(y.loc[1:150,])
test_y = np.array(y.loc[151:,])
# beta = (X^TX)^(-1)X^Ty,计算参数
Xt = np.transpose(train_x)
XtX = np.dot(Xt,train_x)
Xty = np.dot(Xt,train_y)
beta = np.linalg.solve(XtX,Xty)
print(beta)

结果为:
[ 3.07875053e+00 4.65616125e-02 1.80900459e-01 -2.55988893e-03]

接着对后50个数据进行预测如下,

# 对后50行的数据进行预测
pred=[]
for data, actual in zip(test_x, test_y):
    test = np.transpose(data)
    prediction = np.dot(test, beta)
    pred.append(prediction)
    #print('prediction = ' + str(prediction) + ' actual = ' + str(actual))
pred

最后画出预测值和真实值的图像如下,

import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['font.sans-serif'] = ['simHei']
t = np.arange(len(test_x))
plt.plot(t, test_y, 'r-', linewidth=2, label='真实数据')
plt.plot(t, pred, 'g-', linewidth=2, label='预测数据')
plt.legend(loc='upper left')

0428pd1

注意: 下面我们使用scikit-learn中的linear_model来做一下.

0428pd2

2 多元线性回归的梯度下降解法

2.1 确认优化模型的假设函数和损失函数。

比如对于线性回归,假设函数表示为 h θ ( x 1 , x 2 ,   , x n ) = θ 0 + θ 1 x 1 + + θ n x n h_{\theta}(x_1,x_2,\cdots,x_n)=\theta_0+\theta_1x_1+\cdots+\theta_nx_n ,其中 θ i ( i = 0 , 1 , 2 ,   , n ) \theta_i(i=0,1,2,\cdots,n) 为模型参数, x i ( i = 0 , 1 , 2 ,   , n ) x_i(i=0,1,2,\cdots,n) 为每个样本的n个特征值。这个表示可以简化,我们增加一个特征 x 0 = 1 x_0=1 ,这样 h θ ( x 0 , x 1 ,   , x n ) = i = 0 m θ i x i h_{\theta}(x_0,x_1,\cdots,x_n)=\sum\limits_{i=0}^m\theta_ix_i .

同样是线性回归,对应于上面的假设函数,损失函数为:

J ( θ 0 , θ 1 ,   , θ n ) = 1 2 M j = 0 m ( h θ ( x 0 ( j ) , x 1 ( j ) ,   , x n ( j ) ) y j ) 2 J(\theta_0,\theta_1,\cdots,\theta_n)=\frac{1}{2M}\sum\limits_{j=0}^m(h_{\theta}(x_0^{(j)},x_1^{(j)},\cdots,x_n^{(j)})-y_j)^2 .

2.2 算法过程:

  1. 确定当前位置的损失函数的梯度,对于 θ i \theta_i 其梯度表达式如下: θ i J ( θ 0 , θ 1 ,   , θ n ) \frac{\partial}{\partial \theta_i}J(\theta_0,\theta_1,\cdots,\theta_n) .

  2. 用步长乘以损失函数的梯度,得到当前位置下降的距离,即 α θ i J ( θ 0 , θ 1 ,   , θ n ) \alpha\frac{\partial}{\partial \theta_i}J(\theta_0,\theta_1,\cdots,\theta_n) .

  3. 确定是否所有的 θ i \theta_i ,梯度下降的距离都小于 ϵ \epsilon ,如果小于 ϵ \epsilon 则算法终止,否则进入步骤4.

  4. 更新所有的 θ \theta ,对于 θ i \theta_i ,其更新表达式如下。更新完毕后继续转入步骤1. θ i = θ i α θ i J ( θ 0 , θ 1 ,   , θ n ) \theta_i=\theta_i-\alpha\frac{\partial}{\partial \theta_i}J(\theta_0,\theta_1,\cdots,\theta_n) .

下面用线性回归的例子来具体描述梯度下降。假设我们的样本是 ( x 1 ( 0 ) , x 2 ( 0 ) ,   , x n ( 0 ) , y 0 ) , ( x 1 ( 1 ) , x 2 ( 1 ) ,   , x n ( 1 ) , y 1 ) ,   , ( x 1 ( 2 ) , x 2 ( 2 ) ,   , x n ( 2 ) , y 2 ) (x_1^{(0)},x_2^{(0)},\cdots,x_n^{(0)},y_0),(x_1^{(1)},x_2^{(1)},\cdots,x_n^{(1)},y_1),\cdots,(x_1^{(2)},x_2^{(2)},\cdots,x_n^{(2)},y_2) ,损失函数如前面先决条件所述:

J ( θ 0 , θ 1 ,   , θ n ) = 1 2 M j = 0 m ( h θ ( x 0 ( j ) , x 1 ( j ) ,   , x n ( j ) ) y j ) 2 J(\theta_0,\theta_1,\cdots,\theta_n)=\frac{1}{2M}\sum\limits_{j=0}^m(h_{\theta}(x_0^{(j)},x_1^{(j)},\cdots,x_n^{(j)})-y_j)^2 .

则在算法过程步骤1中对于 θ i \theta_i 的偏导数计算如下:

θ i J ( θ 0 , θ 1 ,   , θ n ) = 1 M j = 0 m ( h θ ( x 0 ( j ) , x 1 ( j ) ,   , x n ( j ) ) y j ) x i ( j ) \frac{\partial}{\partial \theta_i}J(\theta_0,\theta_1,\cdots,\theta_n)=\frac{1}{M}\sum\limits_{j=0}^m(h_{\theta}(x_0^{(j)},x_1^{(j)},\cdots,x_n^{(j)})-y_j)x_i^{(j)} .

上式中所有的 x 0 j x_0^j 均为1.

步骤4中 θ i \theta^i 的更新表达式如下:

θ i = θ i α 1 M j = 0 m ( h θ ( x 0 ( j ) , x 1 ( j ) ,   , x n ( j ) ) y j ) x i ( j ) \theta_i=\theta_i-\alpha\frac{1}{M}\sum\limits_{j=0}^m(h_{\theta}(x_0^{(j)},x_1^{(j)},\cdots,x_n^{(j)})-y_j)x_i^{(j)} .

2.3 梯度下降法的矩阵方式描述

1 损失函数可以写为 h θ ( X ) = X θ h_\theta(X)=X\theta ,其中, 假设函数 h θ ( X ) h_\theta(X) hθ(X) 为mx1的向量, θ \theta ( n + 1 ) × 1 (n+1)\times 1 的向量,X为 m × ( n + 1 ) m\times (n+1) )维的矩阵。m代表样本的个数,n+1代表样本的特征数。

损失函数的表达式为: J ( θ ) = 1 2 M ( X θ Y ) T ( X θ Y ) J(\theta)=\frac{1}{2M}(X\theta-Y)^T(X\theta-Y) ,其中YY是样本的输出向量,维度为mx1.

2 算法过程:

  1. 确定当前位置的损失函数的梯度,对于 θ i \theta_i 其梯度表达式如下: θ i J ( θ ) \frac{\partial}{\partial \theta_i}J(\theta) .

  2. 用步长乘以损失函数的梯度,得到当前位置下降的距离,即 α θ i J ( θ ) \alpha\frac{\partial}{\partial \theta_i}J(\theta) .

  3. 确定是否所有的 θ i \theta_i ,梯度下降的距离都小于 ϵ \epsilon ,如果小于 ϵ \epsilon 则算法终止,否则进入步骤4.

  4. 更新所有的 θ \theta ,对于 θ i \theta_i ,其更新表达式如下。更新完毕后继续转入步骤1.

θ i = θ i α θ i J ( θ ) \theta_i=\theta_i-\alpha\frac{\partial}{\partial \theta_i}J(\theta) .

损失函数对于 θ \theta 向量的偏导数计算如下:
θ J ( θ ) = X T ( X θ Y ) \frac{\partial}{\partial \theta}J(\theta)=X^T(X\theta-Y) ,从而步骤4中的更新可以写为: θ = θ α X T ( X θ Y ) \theta=\theta-\alpha X^T(X\theta-Y) .

具体代码如下:

def optimizer(data,init_a,init_b,learning_rate,num_iter):
    b=init_b
    a=init_a
    
    #gradient descent
    for i in range(num_iter):
        a,b=compute_gradient(a,b,data,learning_rate)
        #if i%100==0:
            #print('iter{0}:error={1}'.format(i,computer_error(a,b,data)))
    return [a,b]
def compute_gradient(a_current,b_current,data,learning_rate=0.001):

    a_gradient=0
    b_gradient=0
    
    M=float(len(data))
    for i in range(len(data)):
        x=data[i,0]
        y=data[i,1]
        
        a_gradient+= -(1/M)*(y-(a_current+b_current*x))
        b_gradient+= -(1/M)*x*(y-(a_current+b_current*x))
    #print('a_gradient=%f,b_gradient=%f'%(a_gradient,b_gradient))
    new_b=b_current-(learning_rate*b_gradient)
    new_a=a_current-(learning_rate*a_gradient)
    return [new_a,new_b]
def compute_gradient_vector_version(theta,X,y,learning_rate=0.00001):
    theta_c=copy.copy(theta);theta_c=np.matrix(theta_c)
    X_c=copy.copy(X);X_c=np.matrix(X_c)
    y_c=copy.copy(y);y_c=np.matrix(y_c)
    
    M,N=X_c.shape
    M=float(len(X_c))
    theta_gradient=np.matrix(np.zeros([N,1]))
    #for j in range(N):
        #theta_gradient[j,0]=-(1/M)*(y_c-X_c*theta_c).T*X_c[:,j]
    theta_gradient=-(1/M)*X_c.T*(y_c-X_c*theta_c)
    theta_c = theta_c-(learning_rate*theta_gradient)
    #print(theta_c)
    return theta_c
def optimizer_vector_version(X,y,learning_rate=0.00001,num_iter=10000):
    X_c=copy.copy(X);X_c=np.matrix(X_c)
    y_c=copy.copy(y);y_c=np.matrix(y_c)

    M,N=X_c.shape
    theta=np.matrix(np.zeros([N,1]))
    #gradient descent
    for i in range(num_iter):
        theta=compute_gradient_vector_version(theta,X_c,y_c,learning_rate)
        #if i%100==0:
            #print('iter{0}:error={1}'.format(i,computer_error(a,b,data)))
    return theta

对于上一个讲稿中的例2,做法如下:

#导入数据
res=[]
with open('d:/shuju1.txt','r') as f:
    lines=f.readlines()
    for line in lines:
        res.append(list(map(float,line.strip('\n').split(','))))
res=np.array(res)
data=res
#初始化参数
learning_rate=0.001
initial_b=0
initial_a=0
num_iter=1000
#用两种方法分别得到结果如下.

[a,b]=optimizer(data,initial_a,initial_b,learning_rate,num_iter)
>>(0.24852432182905326, 0.7411262595522877)

import copy
x0=np.ones((67,1))
data.shape,x0.shape
X=np.hstack((x0,data[:,0].reshape(67,1)))
y1=data[:,1].reshape(67,1)
theta_c=optimizer_vector_version(X,y1)

例1 下面我们看一个多元线性回归的例子.

0428pd3

首先用矩阵法实现,代码如下:

import numpy as np
import pandas as pd
data1 = pd.read_csv('c:/users/administrator/desktop/Advertising.csv')    # TV,Radio,Newspaper,Sales
data1['intercept'] = 1
x = data1[['intercept','TV', 'radio', 'newspaper']]
y = data1['sales']
# 前150行的数据作为train,后50行作为test
train_x = np.array(x.loc[1:150,])
test_x = np.array(x.loc[151:,])
train_y = np.array(y.loc[1:150,])
test_y = np.array(y.loc[151:,])
# beta = (X^TX)^(-1)X^Ty,计算参数
Xt = np.transpose(train_x)
XtX = np.dot(Xt,train_x)
Xty = np.dot(Xt,train_y)
beta = np.linalg.solve(XtX,Xty)
print(beta)

结果为:[ 3.07875053e+00 4.65616125e-02 1.80900459e-01 -2.55988893e-03]

画出图像代码如下:

pred=[]
for data, actual in zip(test_x, test_y):
    test = np.transpose(data)
    prediction = np.dot(test, beta)
    pred.append(prediction)
    #print('prediction = ' + str(prediction) + ' actual = ' + str(actual))
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['font.sans-serif'] = ['simHei']
t = np.arange(len(test_x))
plt.plot(t, test_y, 'r-', linewidth=2, label='真实数据')
plt.plot(t, pred, 'g-', linewidth=2, label='预测数据')
plt.legend(loc='upper left')

0428pd4

下面我们用梯度下降法来实现这个题目.

train_y=train_y.reshape((len(train_y),1))
theta_c=optimizer_vector_version(train_x,train_y)
theta_c=theta_c.tolist()

pred=[]
for data, actual in zip(test_x, test_y):
    test = np.transpose(data)
    prediction = np.dot(test, theta_c)
    pred.append(prediction)
    #print('prediction = ' + str(prediction) + ' actual = ' + str(actual))
    
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams['font.sans-serif'] = ['simHei']
t = np.arange(len(test_x))
plt.plot(t, test_y, 'r-', linewidth=2, label='真实数据')
plt.plot(t, pred, 'g-', linewidth=2, label='预测数据')
plt.legend(loc='upper left')

0428pd5

发布了14 篇原创文章 · 获赞 24 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/xxuffei/article/details/90022269
今日推荐