pytorch 常用loss函数整理篇(一)

1.L1 Loss/平均绝对误差(MAE)

1.1 L1 Loss/平均绝对误差(MAE)简介

当torch.nn.L1Loss的参数reduction选择’sum’时即为L1 loss;
当选择 ‘mean’ 或’none’时,即为MAE。
公式如下:
M A E = 1 n ∗ ∑ i = 1 n ∣ y i − y i p ∣ MAE = \cfrac{1}{n} * \sum\limits_{i = 1}^n {| { {y_i} - y_i^p} |} MAE=n1i=1nyiyip
L 1 = ∑ i = 1 n ∣ y i − y i p ∣ L1 = \sum\limits_{i = 1}^n {| { {y_i} - y_i^p} |} L1=i=1nyiyip

1.2 编程实现

coding小栗子如下:

import torch
loss = torch.nn.L1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)

输出结果:

tensor(1.0000) #即0.1+0.2+0.3+0.4=1.0;若选择“mean”,则结果为1.0/4=0.25

2. L2 Loss/均方误差(MSE)

2.1 L2 Loss/均方误差(MSE)简介

L2 Loss、均方误差(MSE)与L1 Loss、平均绝对误差(MAE)类似,只不过它采用了预测值与目标值差值平方和的形式。

在torch.nn.MSELoss()函数中 参数reduction同样有三个可选值:‘none’ , ‘mean’,‘sum’。

公式如下:

M S E = 1 n ⋅ ∑ i = 1 n ( y i − y i p ) 2 MSE = \cfrac{1}{n} \cdot \sum\limits_{i = 1}^n { { {({y_i} - y_i^p)}^2}} MSE=n1i=1n(yiyip)2
L 2 = ∑ i = 1 n ( y i − y i p ) 2 L2 = \sum\limits_{i = 1}^n { { {({y_i} - y_i^p)}^2}} L2=i=1n(yiyip)2

2.2 编程实现

coding小栗子如下:

import torch
loss = torch.nn.MSELoss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)

输出结果为:

tensor(0.3000) #0.01+0.04+0.09+0.16= 0.3

3. SmoothL1 Loss

3.1 SmoothL1 Loss简介

该函数是一个分段函数,在[-1,1]之间采用L2 Loss,其他区间采用L1 Loss。这样,既解决了L1 loss在0点处不可导,曲线不光滑,又解决了L2 Loss梯度爆炸的问题。

fast rcnn论文中提及, 该loss“ is less sensitive to outliers than the L2 loss”。

其公式为:

S m o o t h L 1 L o s s = 1 n ⋅ ∑ i = 1 n Z i SmoothL1Loss = \cfrac{1}{n}\cdot\sum\limits_{i = 1}^n { {Z_i}} SmoothL1Loss=n1i=1nZi

i f ∣ x i − y i ∣ < 1 {if\left| { {x_i} - {y_i}} \right| < 1} ifxiyi<1时, Z i = 0.5 ( x i − y i ) 2 { {Z_i} = 0.5{ {({x_i} - {y_i})}^2}} Zi=0.5(xiyi)2

otherwise, Z i = ∣ x i − y i ∣ − 0.5 { {Z_i} = \left| { {x_i} - {y_i}} \right| - 0.5} Zi=xiyi0.5

3.2 编程实现

coding小栗子如下:

import torch
loss = torch.nn.SmoothL1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,1],[3,6]],dtype=torch.float)
output = loss(pred, target)
print(output)

结果为:

tensor(0.4625)#(0.5*0.1**2+1.2-0.5+0.5*0.3**2+1.6-0.5)/4=0.4625

以上三种Loss函数曲线如下:
在这里插入图片描述
上图参考了https://www.cnblogs.com/wangguchangqing/p/12021638.html)

4. BCELoss和BCEWithLogitsLoss

4.1 BCELoss/BCEWithLogitsLoss简介

本文中没有特别说明,loss都采用默认的mean模式。

二者公式如下:

BCELoss:

B C E L o s s = 1 n ⋅ ∑ i = 1 n l i BCELoss = \cfrac{1}{n}\cdot\sum\limits_{i = 1}^n { {l_i}} BCELoss=n1i=1nli

l i = − [ y i ⋅ log ⁡ x i + ( 1 − y i ) ⋅ log ⁡ ( 1 − x i ) {l_i} = - [{y_{i}} \cdot \log {x_i} + (1 - {y_i}) \cdot \log (1 - {x_i}) li=[yilogxi+(1yi)log(1xi)

直接使用BCELoss时注意, x i {x_i} xi需为0~1之间 。否则会触发以下错误:

RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.615788...

因此在BCELoss()对预测结果进行sigmoid()将其限制在(0,1)是一个不错的操作,即为BCEWithLogitsLoss()。

BCEWithLogitsLoss:

B C E W i t h L o g i t s L o s s = 1 n ⋅ ∑ i = 1 n l i BCEWithLogitsLoss= \cfrac{1}{n}\cdot \sum\limits_{i = 1}^n { {l_i}} BCEWithLogitsLoss=n1i=1nli

l i = − [ y i ⋅ log ⁡ σ ( x i ) + ( 1 − y i ) ⋅ log ⁡ ( 1 − σ ( x i ) ) ] {l_i} = - [{y_{i}} \cdot \log \sigma ({x_i}) + (1 - {y_i}) \cdot \log (1 - \sigma ({x_i}))] li=[yilogσ(xi)+(1yi)log(1σ(xi))],

其中 σ ( x ) \sigma (x) σ(x)为sigmoid()函数。

对于多分类求loss(),可以把预测结果转为(全连接*one_hot类型分类结果),再求BCEWithLogitsLoss()。

4.2 编程实现

BCELoss、BCEWithLogitsLoss及个人coding实现如下:

import torch
import torch.nn as nn
input = torch.randn((2,2))
target = torch.empty((2,2)).random_(2)
#input=torch.tensor([[ 2.4480,  0.3336],
#        [-0.8614, -1.2634]])
#target =torch.tensor([[1., 1.],
#        [0., 0.]])
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()
sigmoid_input=sigmoid(input)
#采用torch.nn.BCELoss()实现
print(loss(sigmoid_input, target))
#自己编程实现
[m,n]=sigmoid_input.shape
res=-1*torch.sum((target*torch.log(sigmoid_input)+(1-target)*torch.log(1-sigmoid_input)))
# res=0
# for i in range(m):
#     for j in range(n):
#         res+=-1*(target[i,j]*torch.log(sigmoid_input[i,j])+(1-target[i,j])*torch.log(1-sigmoid_input[i,j]))
print(res/(m*n))
#直接采用torch.nn.BCEWithLogitsLoss()实现
loss = nn.BCEWithLogitsLoss()
print(loss(input, target))

结果为:

tensor(0.5947)
tensor(0.5947)
tensor(0.5947)

此部分参考了https://blog.csdn.net/qq_22210253/article/details/85222093

5. NLL Loss( negative log likelihood loss)和CrossEntropy Loss

5.1 NLL Loss/CrossEntropy Loss简介

与BCELoss和BCEWithLogitsLoss关系类似,nn.CrossEntropyLoss()可以看做是nn.LogSoftmax() 与nn.NLLLoss() 二者先后作用的效果合成。

这里我们看到:

pred数据的格式: b a t c h _ s i z e ∗ c h a n n e l ∗ h e i g h t ∗ w i d t h {\rm{batch\_size*channel*height*width}} batch_sizechannelheightwidth
channe是数据集的类别数,如VOC数据集,加上背景为21类,channel就是21。
而label数据的格式为: b a t c h _ s i z e ∗ h e i g h t ∗ w i d t h {\rm{batch\_size*height*width}} batch_sizeheightwidth

这一点要和其他loss区分下。

公式:
LogSoftmax ( x i ) = log ⁡ ( exp ⁡ ( x i ) ∑ j exp ⁡ ( x j ) ) \text{LogSoftmax}(x_{i}) = \log(\cfrac{\exp(x_i) }{ \sum_j \exp(x_j)} ) LogSoftmax(xi)=log(jexp(xj)exp(xi))

值得说明的是, nn.LogSoftmax是在dim=1进行,即channel/class维度, 对应程序:
log_soft = nn.LogSoftmax(dim=1)
N L L L o s s = − 1 n ∗ ∑ i = 1 n X i , Y i NLLLoss = - \cfrac{1}{n} * \sum\limits_{i = 1}^n { {X_{i,{Y_i}}}} NLLLoss=n1i=1nXi,Yi

具体的实现过程可结合下面例子,深入理解下。

5.2 编程实现

nn.NLLLoss() ,本人编程及nn.CrossEntropyLoss()三种方式实现的例子如下:

import torch
import torch.nn as nn
x = torch.Tensor([[[1, 2, 1],
                  [2, 2, 1],
                  [0, 1, 1]],
                  [[0, 1, 3],
                  [2, 3, 1],
                  [0, 0, 1]]])
x = x.view([1, 2, 3, 3])
#torch.nn.NLLLoss()实现
log_soft = nn.LogSoftmax(dim=1)
x1=log_soft(x)
#x1:tensor([[[[-0.3133, -0.3133, -2.1269],
#          [-0.6931, -1.3133, -0.6931],
#          [-0.6931, -0.3133, -0.6931]],
#
#         [[-1.3133, -1.3133, -0.1269],
#          [-0.6931, -0.3133, -0.6931],
#          [-0.6931, -1.3133, -0.6931]]]])
y = torch.LongTensor([[1, 0, 1],
                      [0, 0, 1],
                      [1, 1, 1]])
y = y.view([1, 3, 3])
loss = nn.NLLLoss()
print(loss(x1,y))

#本人编程实现
mat=torch.zeros(3,3)
for i in range(3):
    for j in range(3):
        mat[i,j]=-x1[0,int(y[0,i,j]),i,j]
        
#对应于语义分割,可理解为一张图m*n,语义分割类别为k类,X提供了每一类的log_softmax值(k*m*n,不考虑batch).
#若某像素语义分割结果为k1类,则其NLLLOSS值即为log_softmax值[k1,i,j].

#mat:
#tensor([[1.3133, 0.3133, 0.1269],
#        [0.6931, 1.3133, 0.6931],
#        [0.6931, 1.3133, 0.6931]])        
print(torch.sum(mat)/(3*3))        
#torch.nn.CrossEntropyLoss()实现
loss = nn.CrossEntropyLoss()
print(loss(x,y))

结果为:

tensor(0.7947)
tensor(0.7947)
tensor(0.7947)

此部分参考了https://blog.csdn.net/zhaowangbo/article/details/88821017

6.交叉熵损失比二次型(如MSE)损失函数表现更好的原因

参考Understanding the difficulty of training deep feedforward neural networks
及台大李宏毅老师《机器学习》视频2017。
在这里插入图片描述
Figure 5, which plots the training criterion as a function of two weights for a two-layer network (one hidden layer) with hyperbolic tangent units, and a random input and target signal. There are clearly more severe plateau with the quadratic cost.

7.BCE 和Cross Entropy Loss的进一步理解

可参考知乎 损失函数 - 交叉熵损失函数

参考文件

[1] https://pytorch.org/docs/stable/nn.html#loss-functions
[2] https://www.cnblogs.com/wangguchangqing/p/12021638.html
[3] https://blog.csdn.net/qq_22210253/article/details/85222093
[4] https://blog.csdn.net/zhaowangbo/article/details/88821017
[5] https://zhuanlan.zhihu.com/p/35709485

猜你喜欢

转载自blog.csdn.net/WANGWUSHAN/article/details/105903765