Logistic回归摘记

1. Sigmoid函数

\qquad 本文中Sigmoid函数用 S ( x ) S(x) 表示:

S ( x ) = 1 1 + e x \qquad\qquad S(x)=\dfrac{1}{1+e^{-x}}

\qquad Sigmoid函数具有特殊的性质: [ S ( x ) ] = S ( x ) [ 1 S ( x ) ] [S(x) ]^{'}=S(x) [ 1-S(x) ]
在这里插入图片描述

Sigmoid函数的曲线在中心 ( x = 0 , y = 0.5 ) (x=0,y=0.5 ) 附近增长速度较快,在两端增长速度缓慢
其中,虚线为阶梯函数

2. Logistic Regression模型

\qquad 如果将Sigmoid函数 S ( x ) S(x) 作为线性模型 f ( x ) = w T x + b f(\boldsymbol x)=\boldsymbol{w}^T \boldsymbol x + b 的变换函数,那么:

y ( x ) = S [ f ( x ) ] = 1 1 + e ( w T x + b ) \qquad\qquad y(\boldsymbol{x})=S [ f(\boldsymbol x) ]=\dfrac{1}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}

\qquad 对于某个样本 x \boldsymbol{x^{\ast}} 来说,其输出值为 y = y ( x ) y=y(\boldsymbol x^{\ast}) ,可得到:

ln ( y 1 y ) = w T x + b \qquad\qquad \ln\left( \dfrac{y}{1-y}\right)=\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b

\qquad 如果将 y y 看作是样本 x \boldsymbol{x^{\ast}} 正例的可能性(概率),将 1 y 1-y 看作是样本 x \boldsymbol{x^{\ast}} 负例的可能性(概率),两者的比率取对数 ln ( y 1 y ) \ln\left( \dfrac{y}{1-y}\right) 反映了对样本 x \boldsymbol{x^{\ast}} 进行“线性分类”的情况(如下图所示):

1 ) \qquad1) 如果 y = 0.5 y=0.5 ,那么 1 y = 0.5 1-y=0.5 ln ( y 1 y ) = 0 \ln\left( \dfrac{y}{1-y}\right)=0 ,此时 w T x + b = 0 \boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b=0

\qquad  从线性模型的角度来说,样本 x \boldsymbol{x^{\ast}} 正好处在分界面(红色直线)上,作为正例和负例的可能性是相同的。

2 ) \qquad2) 如果 y > 0.5 y>0.5 ,那么 1 y < 0.5 1-y<0.5 ln ( y 1 y ) > 0 \ln\left( \dfrac{y}{1-y}\right)>0 ,此时 w T x + b > 0 \boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b>0

\qquad  这说明了,样本 x \boldsymbol{x^{\ast}} 处在分界面的上侧。

3 ) \qquad3) 如果 y < 0.5 y<0.5 ,那么 1 y > 0.5 1-y>0.5 ln ( y 1 y ) < 0 \ln\left( \dfrac{y}{1-y}\right)<0 ,此时 w T x + b < 0 \boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b<0

\qquad  这说明了,样本 x \boldsymbol{x^{\ast}} 处在分界面的下侧。
在这里插入图片描述

通过Sigmoid函数可以将线性模型 y = w T x + b y=\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b 的输出值 y y 转化为 [ 0 , 1 ] [0,1] 之间
 
如果事件发生的概率为 p p ,那么该事件发生的几率(odds)定义为 p 1 p \dfrac{p}{1-p} ,该事件的对数几率(log odds)定义为 ln ( p 1 p ) \ln\left(\dfrac{p}{1-p}\right)

\qquad 如果采用变量 c = 1 c=1 表示上图中的 R 1 \mathcal R_1 区域,用 c = 0 c=0 表示上图中的 R 2 \mathcal R_2 区域,那么可将 y ( x ) y(\boldsymbol x) 的值视为类后验概率 ,即:

p ( c = 1 x ) = y ( x ) = 1 1 + e ( w T x + b ) \qquad\qquad p(c=1|\boldsymbol{x})=y(\boldsymbol{x})=\dfrac{1}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}

\qquad 此时,关于 p ( c = 1 x ) p(c=1|\boldsymbol{x}) 的对数几率就是线性模型:

ln p ( c = 1 x ) p ( c = 0 x ) = w T x + b \qquad\qquad\ln \dfrac{p(c=1|\boldsymbol{x})}{p(c=0|\boldsymbol{x})}=\boldsymbol{w}^T\boldsymbol{x}+b

  • x \boldsymbol{x} 正例 ( c = 1 ) (c=1) 的概率:

p ( c = 1 x ) = 1 1 + e ( w T x + b ) \qquad\qquad p(c=1|\boldsymbol{x})=\dfrac{1}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}

\qquad 线性函数 w T x + b \boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 + +\infty ,概率值越接近 1 1
\qquad 线性函数 w T x + b \boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 -\infty ,概率值越接近 0 0

  • x \boldsymbol{x} 负例 ( c = 0 ) (c=0) 的概率:

p ( c = 0 x ) = 1 p ( y = 1 x ) = e ( w T x + b ) 1 + e ( w T x + b ) = 1 1 + e ( w T x + b ) \qquad\qquad\begin{aligned} p(c=0|\boldsymbol{x})&=1-p(y=1|\boldsymbol{x})\\ &=\dfrac{e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}\\ &=\dfrac{1}{1+e^{(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}\end{aligned}

\qquad 线性函数 w T x + b \boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 + +\infty ,概率值越接近 0 0
\qquad 线性函数 w T x + b \boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 -\infty ,概率值越接近 1 1

\qquad 显然,对于新的输入样本 x \boldsymbol{x^{\ast}} 而言,按照最大后验概率准则:如果 p ( y = 1 x ) > p ( y = 0 x ) p(y=1|\boldsymbol{x^{\ast}})>p(y=0|\boldsymbol{x^{\ast}}) ,则认为 x \boldsymbol{x^{\ast}} 属于 R 1 R_{1} ;如果 p ( y = 1 x ) < p ( y = 0 x ) p(y=1|\boldsymbol{x^{\ast}})<p(y=0|\boldsymbol{x^{\ast}}) ,则认为 x \boldsymbol{x^{\ast}} 属于 R 2 R_{2}
\qquad

3. 模型的参数估计

\qquad 假设训练样本为 { ( x i , c i ) } i = 1 N \{ ( \boldsymbol{x}_{i},c_{i}) \} _{i=1}^N ,其中 x i R n , c i { 0 , 1 } \boldsymbol{x}_{i}\in R^{n},c_{i}\in \{0,1\} ,采用最大似然估计来求模型的参数 ( w , b ) (\boldsymbol{w},b)

1 ) \qquad1) y ( x ) = p ( c = 1 x ) y(\boldsymbol{x})=p(c=1|\boldsymbol{x}) ,训练样本集的似然函数可表示为:

L ( w , b ) = i = 1 N y ( x i ) c i [ 1 y ( x i ) ] 1 c i \qquad\qquad L(\boldsymbol{w},b)=\displaystyle\prod_{i=1}^N y(\boldsymbol{x}_{i})^{c_{i}}\left[ 1-y(\boldsymbol{x}_{i})\right] ^{1-c_{i}}

2 ) \qquad2) 对数似然函数可表示为:

ln L ( w , b ) = i = 1 N { c i ln [ y ( x i ) ] + ( 1 c i ) ln [ 1 y ( x i ) ] } = i = 1 N { c i ln y ( x i ) 1 y ( x i ) + ln [ 1 y ( x i ) ] } = i = 1 N { c i ( w T x i + b ) ln [ 1 + e ( w T x i + b ) ] } \qquad\qquad\begin{aligned} \ln L(\boldsymbol{w},b)&=\displaystyle\sum_{i=1}^N \{ c_{i}\ln\left[ y\left( \boldsymbol{x}_{i}\right) \right] +(1-c_{i})\ln\left[ 1-y\left( \boldsymbol{x}_{i}\right) \right] \}\\ &=\displaystyle\sum_{i=1}^N\left\{ c_{i}\ln\dfrac{ y\left( \boldsymbol{x}_{i}\right)}{1-y\left( \boldsymbol{x}_{i}\right)} +\ln\left[ 1-y\left( \boldsymbol{x}_{i}\right) \right] \right\} \\ &=\displaystyle\sum_{i=1}^N\left\{ c_{i}\left(\boldsymbol{w}^{T}\boldsymbol{x}_{i}+b\right)-\ln\left[ 1+e^{\left(\boldsymbol{w}^{T}\boldsymbol{x}_{i}+b\right)} \right] \right\} \\ \end{aligned}

3 ) \qquad3) 若令 β = [ w T , b ] T ,   x ^ i = [ x i T , 1 ] T \boldsymbol{\beta}=[\boldsymbol{w}^T,b]^{T},\ \hat{\boldsymbol{x}}_{i}=[\boldsymbol{x}_{i}^T,1]^{T} ,那么线性模型 w T x i + b = β T x ^ i \boldsymbol{w}^{T}\boldsymbol{x}_{i}+b=\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i} ,从而有

ln L ( β ) = i = 1 N [ c i β T x ^ i ln ( 1 + e β T x ^ i ) ] \qquad\qquad \ln L(\boldsymbol{\beta})=\displaystyle\sum_{i=1}^N \left[ c_{i}\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}-\ln\left( 1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}} \right) \right]

\qquad 通过最大似然估计,可以估计出Logistic Regression模型的参数 β = [ w T , b ] T \boldsymbol{\beta}=[\boldsymbol{w}^{T},b]^{T}

4. 模型学习的最优化算法

\qquad 一般取“负的对数似然函数”作为损失函数,即: l ( β ) = ln L ( w , b ) = ln L ( β ) l(\boldsymbol{\beta})=-\ln L(\boldsymbol{w},b)=-\ln L(\boldsymbol{\beta}) 。最大化似然函数,相当于最小化损失函数 l ( β ) l(\boldsymbol{\beta})

l ( β ) = ln L ( β ) = i = 1 N [ c i β T x ^ i ln ( 1 + e β T x ^ i ) ] \qquad\qquad l(\boldsymbol{\beta})=-\ln L(\boldsymbol{\beta})=-\displaystyle\sum_{i=1}^N \left[ c_{i}\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}-\ln\left( 1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}} \right) \right]

\qquad 由于 l ( β ) l(\boldsymbol{\beta}) 是关于 β \boldsymbol{\beta} 的高阶可导连续凸函数,可采用数值优化方法对 β \boldsymbol{\beta} 进行求解。
\qquad

4.1 梯度下降法

\qquad 采用梯度下降法求解时,需要求出“负梯度方向”作为下降方向:

l ( β ) β = i = 1 N ( c i x ^ i 1 1 + e β T x ^ i e β T x ^ i x ^ i ) = i = 1 N ( c i e β T x ^ i 1 + e β T x ^ i ) x ^ i = i = 1 N [ c i y ( x i ) ] x ^ i   ( 1 ) \qquad\qquad \begin{aligned} \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}&=-\displaystyle\sum_{i=1}^N \left( c_{i}\hat{\boldsymbol{x}}_{i}-\dfrac{1}{1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}}e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}\hat{\boldsymbol{x}}_{i} \right)\\ &=-\displaystyle\sum_{i=1}^N \left( c_{i}-\dfrac{e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}}{1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}} \right)\hat{\boldsymbol{x}}_{i} \\ &=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i} \qquad\qquad\qquad\ (1)\\ \end{aligned}

\qquad 由于参数 β = [ w T , b ] T = [ w 1 , , w n , b ] T \boldsymbol{\beta}=[\boldsymbol{w}^T,b]^{T}=[w_1,\cdots,w_n,b]^T 以及 x ^ i = [ x i T , 1 ] T = [ x i ( 1 ) , , x i ( n ) , 1 ] T \hat\boldsymbol{x}_{i}=[\boldsymbol{x}_{i}^T,1]^T=[x_i^{(1)},\cdots,x_i^{(n)},1]^T ,公式(1)实际上为:

{     l ( β ) w = i = 1 N [ c i y ( x i ) ] x i    ( 2 )     l ( β ) b = i = 1 N [ c i y ( x i ) ]   ( 3 ) \qquad\qquad\begin{cases}\ \ \ \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol w}=-\displaystyle\sum_{i=1}^N [ c_{i}-y\left( \boldsymbol{x}_{i}\right) ]\boldsymbol{x}_{i}\qquad\qquad\ \ (2) \\ \\ \ \ \ \dfrac{\partial l(\boldsymbol{\beta})}{\partial b}=-\displaystyle\sum_{i=1}^N\left[ c_{i}-y\left( \boldsymbol{x}_{i}\right) \right] \qquad\qquad(3) \end{cases}

\qquad 若考虑 x i = [ x i ( 1 ) , , x i ( n ) ] T \boldsymbol{x}_{i}=[x_i^{(1)},\cdots,x_i^{(n)}]^T 的每一个分量 x i ( j ) x_{i}^{(j)} ,公式(2)还可以表示为:

l ( β ) w j = i = 1 N [ c i y ( x i ) ] x i ( j ) \qquad\qquad \dfrac{\partial l(\boldsymbol{\beta})}{\partial w_{j}}=-\displaystyle\sum_{i=1}^N [ c_{i}-y\left( \boldsymbol{x}_{i}\right) ]x_{i}^{(j)}

\qquad 对数似然函数的梯度可以表示为:

l ( β ) β = [ l ( β ) w 1 , , l ( β ) w n , l ( β ) b ] T \qquad\qquad \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\left[\dfrac{\partial l(\boldsymbol{\beta})}{\partial w_1},\cdots,\dfrac{\partial l(\boldsymbol{\beta})}{\partial w_n},\dfrac{\partial l(\boldsymbol{\beta})}{\partial b} \right]^{T}

\qquad 因此,采用梯度下降法的权值更新公式为( α \alpha 为步长):

β t + 1 = β t α l ( β ) β \qquad\qquad \boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^{t}-\alpha\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}
\qquad

4.2 牛顿法

\qquad 采用牛顿法求解最优化问题时,是在搜索点取泰勒级数的二阶近似的导数为 0 0 。除了要求梯度之外,还需要求出 h e s s i a n hessian 矩阵的逆。

\qquad 由于已经求出:   l ( β ) β = i = 1 N [ c i y ( x i ) ] x ^ i \ \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i}

\qquad 那么, h e s s i a n hessian 矩阵就为:

β T ( l ( β ) β ) = β T ( i = 1 N [ c i y ( x i ) ] x ^ i ) = β T ( i = 1 N y ( x i ) x ^ i ) = i = 1 N y ( x i ) [ 1 y ( x i ) ] x ^ i β T ( β T x ^ i ) = i = 1 N y ( x i ) [ 1 y ( x i ) ] x ^ i x ^ i T \qquad\qquad \begin{aligned} \dfrac{\partial}{\partial \boldsymbol\beta^T}\left(\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol\beta}\right) &=\dfrac{\partial}{\partial \boldsymbol\beta^T}\left(-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i}\right)\\ &=\dfrac{\partial}{\partial \boldsymbol\beta^T}\left(\displaystyle\sum_{i=1}^Ny(\boldsymbol{x}_{i})\hat\boldsymbol{x}_i\right)\\ &=\displaystyle\sum_{i=1}^Ny(\boldsymbol{x}_{i})[1-y(\boldsymbol{x}_{i})]\hat\boldsymbol{x}_i\dfrac{\partial}{\partial \boldsymbol\beta^T}\left(\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}\right)\\ &=\displaystyle\sum_{i=1}^Ny(\boldsymbol{x}_{i})[1-y(\boldsymbol{x}_{i})]\hat\boldsymbol{x}_i\hat{\boldsymbol{x}}_{i}^T\\ \end{aligned}

\qquad 因此,采用牛顿法的权值更新公式为:

β t + 1 = β t ( 2 l ( β ) β β T ) 1 l ( β ) β \qquad\qquad \boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^{t}-\left(\dfrac{\partial^2 l(\boldsymbol\beta)}{\partial \boldsymbol\beta\partial \beta^T}\right)^{-1}\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}

\qquad

5. 模型训练步骤

\qquad 若采用梯度下降法来求解模型的参数,则训练步骤如下:

1 ) \qquad1) 随机选择 β = [ w T , b ] T \boldsymbol{\beta}=[\boldsymbol{w}^T,b]^{T} 的初始值 β 0 \boldsymbol{\beta}^{0}

2 ) \qquad2) 选择步长 α \alpha ,迭代计算下列公式,直到满足终止条件

l ( β ) β = i = 1 N [ c i y ( x i ) ] x ^ i = i = 1 N [ c i y ( x i ) ] [ x i 1 ] \qquad\qquad \begin{aligned} \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} &=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i}\\ &=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\left[\begin{matrix}\boldsymbol{x}_{i}\\ 1\end{matrix}\right]\\ \end{aligned}

β t + 1 = β t α l ( β ) β \qquad\qquad \boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^{t}-\alpha\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}
\qquad

6. 实现代码(二分类)

1) 定义Sigmoid函数

def sigmoid(x):
    '''Sigmoid function
    '''
    return 1.0/(1 + np.exp(-x))

2) 读取训练/测试集数据函数

假设在二维平面 R 2 R^{2} 上生成的数据集的格式为 ( x i , y i ) = ( x i ( 1 ) , x i ( 2 ) , y i ) ,   y i { 0 , 1 } (\boldsymbol{x}_{i},y_{i})=(x_{i}^{(1)},x_{i}^{(2)},y_{i}), \ y_{i}\in\{0,1\} 的形式:

3.562302 , 25.329208 , 1.000000 24.268267 , 1.272092 , 1.000000 25.405790 , 8.463017 , 1.000000 6.908775 , 23.298889 , 1.000000 40.621010 , 25.134052 , 0.000000 9.305521 , 14.983097 , 1.000000 20.041330 , 25.381725 , 0.000000 37.298540 , 26.767307 , 0.000000 35.856177 , 31.080316 , 0.000000 17.976889 , 4.244106 , 1.000000 3.562302,25.329208,1.000000\newline -24.268267,1.272092,1.000000\newline 25.405790,8.463017,1.000000\newline -6.908775,23.298889,1.000000\newline 40.621010,-25.134052,0.000000\newline -9.305521,14.983097,1.000000\newline 20.041330,-25.381725,0.000000\newline 37.298540,-26.767307,0.000000\newline 35.856177,-31.080316,0.000000\newline -17.976889,4.244106,1.000000\newline \cdot\cdot\cdot\cdot\cdot\cdot

生成具有两个中心的二维高斯分布的数据集:

def gen_gausssian(mean1, mean2, cov1, cov2, num):
    '''generate 2-d gaussian dataset with 2 clusters
'''
    # postive data
    data1 = np.random.multivariate_normal(mean1,cov1,num)
    label1 = np.ones((1,num)).T
    data_pos = np.append(data1,label1,axis=1)
    # negative data
    data2 = np.random.multivariate_normal(mean2,cov2,num)
    label2 = np.zeros((1,num)).T
    data_neg = np.append(data2,label2,axis=1)
    # all data
    data = np.append(data_pos,data_neg,axis=0)
    # shuffled data
    shuffle_data = np.random.permutation(data)
	
    # scatter plot
    x1,y1 = data1.T
    x2,y2 = data2.T
    plt.scatter(x1,y1,c='r',s=3)
    plt.plot(mean1[0],mean1[1],'ko')    
    plt.scatter(x2,y2,c='b',s=3)
    plt.plot(mean2[0],mean2[1],'ko')
    plt.axis()
    plt.title("2-d gaussian dataset with 2 clusters")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.show()           
    
    np.savetxt('gaussdata.txt', shuffle_data, fmt='%f',delimiter=',')
    return shuffle_data, data_pos, data_neg

用散点图表示:
在这里插入图片描述
读取以 ( x i , y i ) = ( x i ( 1 ) , x i ( 2 ) , y i ) ,   y i { 0 , 1 } (\boldsymbol{x}_{i},y_{i})=(x_{i}^{(1)},x_{i}^{(2)},y_{i}), \ y_{i}\in\{0,1\} 格式保存的训练数据,返回numpy数组形式:

def load_data(filename):
    '''Load data of training or testing set
    '''
    tdata = []
    with open(filename) as f:
        while True:
            line = f.readline()
            if not line:
                break
            line = line.split(',')         
            tdata.append([float(item) for item in line])
            
    f.close()
    return np.array(tdata)

3) 迭代计算公式 ( 2 ) (2) ,并显示每次迭代后的损失函数值

def lr_train(xhat,c,alpha,num):
    
    beta = np.random.rand(3,1)    
    for i in range(num):
        yx = sigmoid(np.dot(xhat, beta))
        beta = beta + alpha * np.dot(xhat.T, (c - yx))
        print('#'+str(i)+',training loss:'+str(train_loss(c, yx)))        
    
    return beta

由公式   ln L ( w , b ) = i = 1 N { c i ln [ y ( x i ) ] + ( 1 c i ) ln [ 1 y ( x i ) ] } \ -\ln L(\boldsymbol{w},b)=-\displaystyle\sum_{i=1}^N \{ c_{i}\ln\left[ y\left( \boldsymbol{x}_{i}\right) \right] +(1-c_{i})\ln\left[ 1-y\left( \boldsymbol{x}_{i}\right) \right] \}

计算损失函数(误差值)

def train_loss(c, yx):
    
    err = 0.0
    for i in range(len(yx)):
        if yx[i,0] > 0 and (1 - yx[i,0]) > 0:
            err -= c[i,0] * np.log(yx[i,0]) + (1-c[i,0])*np.log(1-yx[i,0])

    return err

主程序:

    # 生成2200个数据,前2000个作为训练集,后200个作为测试集
    mean1 = [3,-1]
    cov1 = [[5,0],[0,10]]        
    mean2 = [-5,7]
    cov2 = [[10,0],[0,5]]
    data,data_pos,data_neg = gen_gausssian(mean1,mean2,cov1,cov2,1100)
    # 读取前2000个数据进行训练
    training_data = data
    tmp1 = training_data[0:2000,0:2]
    tmp2 = np.ones((2000,1))
    xhat = np.concatenate((tmp1,tmp2),axis=1)
    target = training_data[0:2000,2:]
    # 训练数据集100次,步长0.01
    beta = lr_train(xhat,target,0.01,100)
    print('beta:\n', beta)
    # 测试200个训练数据
    tmp1 = training_data[2000:2200,0:2]
    tmp2 = np.ones((200,1))
    testing_data = np.concatenate((tmp1,tmp2),axis=1)
    target = training_data[2000:2200,2:]
    y1 = classification(testing_data, beta)    
    print(np.abs(y1-target).T)

对一个大小为2000的数据进行训练,可得到输出结果为:
#0,training loss:2767.7605301149197
#1,training loss:28706.32704095256
#2,training loss:24304.21071966826
#3,training loss:20729.807928831706
#4,training loss:18031.980567667095
#5,training loss:15793.907613945637
#6,training loss:13876.408972848896
#7,training loss:12260.25776604957
#8,training loss:10857.914022333194
#9,training loss:9702.173760769088
#10,training loss:8739.995403737194
#11,training loss:7909.116254144592
#12,training loss:7237.015743718265
#13,training loss:6581.515845960798
#14,training loss:6155.195323818418
#15,training loss:5782.624246205244
#16,training loss:5451.120323877512
#17,training loss:5159.985309063984
#18,training loss:4921.653117909279
#19,training loss:4728.055485820308
#20,training loss:4546.101559000789
#21,training loss:4368.003415240011
#22,training loss:4196.188712568878
#23,training loss:4032.1962049440162
#24,training loss:3876.771728177838
#25,training loss:3694.5715060625985
#26,training loss:3554.8126869561006
#27,training loss:3418.8100373192524
#28,training loss:3321.3029188728215
#29,training loss:3189.8131265721095
#30,training loss:3060.4306284382133
#31,training loss:2932.529506577584
#32,training loss:2807.0843716420854
#33,training loss:2684.4698423911955
#34,training loss:2564.789169175422
#35,training loss:2447.909093126709
#36,training loss:2333.712055985516
#37,training loss:2222.305120198585
#38,training loss:2114.0629245811747
#39,training loss:2009.9696271327145
#40,training loss:1911.4641438101942
#41,training loss:1818.4131000336629
#42,training loss:1731.1576524394175
#43,training loss:1648.321160807572
#44,training loss:1568.548376402433
#45,training loss:1491.2975705058457
#46,training loss:1416.3652001741157
#47,training loss:1343.7359069149327
#48,training loss:1273.1915049002964
#49,training loss:1204.2529637870934
#50,training loss:1136.9266223350025
#51,training loss:1071.3457943359633
#52,training loss:1007.323134162851
#53,training loss:944.9219846916478
#54,training loss:885.1816608689702
#55,training loss:899.9599868299116
#56,training loss:845.0057775193546
#57,training loss:793.5441317445959
#58,training loss:745.7136807432933
#59,training loss:701.6553680843865
#60,training loss:696.1322808438866
#61,training loss:689.6549290071879
#62,training loss:648.194522863791
#63,training loss:608.4750616604265
#64,training loss:570.7085584125251
#65,training loss:535.7643996771293
#66,training loss:504.1081114497143
#67,training loss:475.508191984496
#68,training loss:450.66041076529723
#69,training loss:429.67911785665814
#70,training loss:411.2956082830516
#71,training loss:394.87591024838343
#72,training loss:380.24926997738123
#73,training loss:403.36316867372153
#74,training loss:391.38976162245194
#75,training loss:381.3801916299097
#76,training loss:372.99093649597427
#77,training loss:366.0959147066188
#78,training loss:360.56881602093216
#79,training loss:355.8694556375197
#80,training loss:351.9153236373713
#81,training loss:348.4531107835549
#82,training loss:345.4202325927137
#83,training loss:342.7408477041635
#84,training loss:340.38193613578653
#85,training loss:338.2928279212097
#86,training loss:336.440189142936
#87,training loss:334.7845603199353
#88,training loss:333.29353615870866
#89,training loss:331.9350276518051
#90,training loss:330.68392791191314
#91,training loss:329.51840299766616
#92,training loss:328.4199642611809
#93,training loss:327.3721997842567
#94,training loss:326.36511808367675
#95,training loss:325.3868163444934
#96,training loss:324.43080556455044
#97,training loss:323.49135041237633
#98,training loss:322.5630405032738
#99,training loss:321.64316225672036

beta:
[[ 5.96987205]
[-6.41668657]
[30.84845393]]

这里的beta值就是用梯度下降法所求得 β = [ w T , b ] T \boldsymbol{\beta}=[\boldsymbol{w}^{T},b]^{T} 的结果。

def classification(testing_data, beta):
    y = sigmoid(np.dot(testing_data, beta))    
    for i in range(len(y)):
        if y[i,0] < 0.5:
            y[i,0] = 0.0
        else:
            y[i,0] = 1.0
    return y

判别结果:200个测试数据,有2个数据被错误分类
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0.]]

猜你喜欢

转载自blog.csdn.net/xfijun/article/details/84983115