pytorch 常用loss函数
1.L1 Loss/平均绝对误差(MAE)
1.1 L1 Loss/平均绝对误差(MAE)简介
当torch.nn.L1Loss的参数reduction选择’sum’时即为L1 loss;
当选择 ‘mean’ 或’none’时,即为MAE。
公式如下:
M A E = 1 n ∗ ∑ i = 1 n ∣ y i − y i p ∣ MAE = \cfrac{1}{n} * \sum\limits_{i = 1}^n {| {
{y_i} - y_i^p} |} MAE=n1∗i=1∑n∣yi−yip∣
L 1 = ∑ i = 1 n ∣ y i − y i p ∣ L1 = \sum\limits_{i = 1}^n {| {
{y_i} - y_i^p} |} L1=i=1∑n∣yi−yip∣
1.2 编程实现
coding小栗子如下:
import torch
loss = torch.nn.L1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)
输出结果:
tensor(1.0000) #即0.1+0.2+0.3+0.4=1.0;若选择“mean”,则结果为1.0/4=0.25
2. L2 Loss/均方误差(MSE)
2.1 L2 Loss/均方误差(MSE)简介
L2 Loss、均方误差(MSE)与L1 Loss、平均绝对误差(MAE)类似,只不过它采用了预测值与目标值差值平方和的形式。
在torch.nn.MSELoss()函数中 参数reduction同样有三个可选值:‘none’ , ‘mean’,‘sum’。
公式如下:
M S E = 1 n ⋅ ∑ i = 1 n ( y i − y i p ) 2 MSE = \cfrac{1}{n} \cdot \sum\limits_{i = 1}^n {
{
{({y_i} - y_i^p)}^2}} MSE=n1⋅i=1∑n(yi−yip)2
L 2 = ∑ i = 1 n ( y i − y i p ) 2 L2 = \sum\limits_{i = 1}^n {
{
{({y_i} - y_i^p)}^2}} L2=i=1∑n(yi−yip)2
2.2 编程实现
coding小栗子如下:
import torch
loss = torch.nn.MSELoss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)
输出结果为:
tensor(0.3000) #0.01+0.04+0.09+0.16= 0.3
3. SmoothL1 Loss
3.1 SmoothL1 Loss简介
该函数是一个分段函数,在[-1,1]之间采用L2 Loss,其他区间采用L1 Loss。这样,既解决了L1 loss在0点处不可导,曲线不光滑,又解决了L2 Loss梯度爆炸的问题。
fast rcnn论文中提及, 该loss“ is less sensitive to outliers than the L2 loss”。
其公式为:
S m o o t h L 1 L o s s = 1 n ⋅ ∑ i = 1 n Z i SmoothL1Loss = \cfrac{1}{n}\cdot\sum\limits_{i = 1}^n { {Z_i}} SmoothL1Loss=n1⋅i=1∑nZi
i f ∣ x i − y i ∣ < 1 {if\left| { {x_i} - {y_i}} \right| < 1} if∣xi−yi∣<1时, Z i = 0.5 ( x i − y i ) 2 { {Z_i} = 0.5{ {({x_i} - {y_i})}^2}} Zi=0.5(xi−yi)2
otherwise, Z i = ∣ x i − y i ∣ − 0.5 { {Z_i} = \left| { {x_i} - {y_i}} \right| - 0.5} Zi=∣xi−yi∣−0.5
3.2 编程实现
coding小栗子如下:
import torch
loss = torch.nn.SmoothL1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,1],[3,6]],dtype=torch.float)
output = loss(pred, target)
print(output)
结果为:
tensor(0.4625)#(0.5*0.1**2+1.2-0.5+0.5*0.3**2+1.6-0.5)/4=0.4625
以上三种Loss函数曲线如下:
上图参考了https://www.cnblogs.com/wangguchangqing/p/12021638.html)
4. BCELoss和BCEWithLogitsLoss
4.1 BCELoss/BCEWithLogitsLoss简介
本文中没有特别说明,loss都采用默认的mean模式。
二者公式如下:
BCELoss:
B C E L o s s = 1 n ⋅ ∑ i = 1 n l i BCELoss = \cfrac{1}{n}\cdot\sum\limits_{i = 1}^n { {l_i}} BCELoss=n1⋅i=1∑nli
l i = − [ y i ⋅ log x i + ( 1 − y i ) ⋅ log ( 1 − x i ) {l_i} = - [{y_{i}} \cdot \log {x_i} + (1 - {y_i}) \cdot \log (1 - {x_i}) li=−[yi⋅logxi+(1−yi)⋅log(1−xi)
直接使用BCELoss时注意, x i {x_i} xi需为0~1之间 。否则会触发以下错误:
RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.615788...
因此在BCELoss()对预测结果进行sigmoid()将其限制在(0,1)是一个不错的操作,即为BCEWithLogitsLoss()。
BCEWithLogitsLoss:
B C E W i t h L o g i t s L o s s = 1 n ⋅ ∑ i = 1 n l i BCEWithLogitsLoss= \cfrac{1}{n}\cdot \sum\limits_{i = 1}^n { {l_i}} BCEWithLogitsLoss=n1⋅i=1∑nli
l i = − [ y i ⋅ log σ ( x i ) + ( 1 − y i ) ⋅ log ( 1 − σ ( x i ) ) ] {l_i} = - [{y_{i}} \cdot \log \sigma ({x_i}) + (1 - {y_i}) \cdot \log (1 - \sigma ({x_i}))] li=−[yi⋅logσ(xi)+(1−yi)⋅log(1−σ(xi))],
其中 σ ( x ) \sigma (x) σ(x)为sigmoid()函数。
对于多分类求loss(),可以把预测结果转为(全连接*one_hot类型分类结果),再求BCEWithLogitsLoss()。
4.2 编程实现
BCELoss、BCEWithLogitsLoss及个人coding实现如下:
import torch
import torch.nn as nn
input = torch.randn((2,2))
target = torch.empty((2,2)).random_(2)
#input=torch.tensor([[ 2.4480, 0.3336],
# [-0.8614, -1.2634]])
#target =torch.tensor([[1., 1.],
# [0., 0.]])
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()
sigmoid_input=sigmoid(input)
#采用torch.nn.BCELoss()实现
print(loss(sigmoid_input, target))
#自己编程实现
[m,n]=sigmoid_input.shape
res=-1*torch.sum((target*torch.log(sigmoid_input)+(1-target)*torch.log(1-sigmoid_input)))
# res=0
# for i in range(m):
# for j in range(n):
# res+=-1*(target[i,j]*torch.log(sigmoid_input[i,j])+(1-target[i,j])*torch.log(1-sigmoid_input[i,j]))
print(res/(m*n))
#直接采用torch.nn.BCEWithLogitsLoss()实现
loss = nn.BCEWithLogitsLoss()
print(loss(input, target))
结果为:
tensor(0.5947)
tensor(0.5947)
tensor(0.5947)
此部分参考了https://blog.csdn.net/qq_22210253/article/details/85222093
5. NLL Loss( negative log likelihood loss)和CrossEntropy Loss
5.1 NLL Loss/CrossEntropy Loss简介
与BCELoss和BCEWithLogitsLoss关系类似,nn.CrossEntropyLoss()可以看做是nn.LogSoftmax() 与nn.NLLLoss() 二者先后作用的效果合成。
这里我们看到:
pred数据的格式: b a t c h _ s i z e ∗ c h a n n e l ∗ h e i g h t ∗ w i d t h {\rm{batch\_size*channel*height*width}} batch_size∗channel∗height∗width
channe是数据集的类别数,如VOC数据集,加上背景为21类,channel就是21。
而label数据的格式为: b a t c h _ s i z e ∗ h e i g h t ∗ w i d t h {\rm{batch\_size*height*width}} batch_size∗height∗width。
这一点要和其他loss区分下。
公式:
LogSoftmax ( x i ) = log ( exp ( x i ) ∑ j exp ( x j ) ) \text{LogSoftmax}(x_{i}) = \log(\cfrac{\exp(x_i) }{ \sum_j \exp(x_j)} ) LogSoftmax(xi)=log(∑jexp(xj)exp(xi))
值得说明的是, nn.LogSoftmax是在dim=1进行,即channel/class维度, 对应程序:
log_soft = nn.LogSoftmax(dim=1)
N L L L o s s = − 1 n ∗ ∑ i = 1 n X i , Y i NLLLoss = - \cfrac{1}{n} * \sum\limits_{i = 1}^n {
{X_{i,{Y_i}}}} NLLLoss=−n1∗i=1∑nXi,Yi
具体的实现过程可结合下面例子,深入理解下。
5.2 编程实现
nn.NLLLoss() ,本人编程及nn.CrossEntropyLoss()三种方式实现的例子如下:
import torch
import torch.nn as nn
x = torch.Tensor([[[1, 2, 1],
[2, 2, 1],
[0, 1, 1]],
[[0, 1, 3],
[2, 3, 1],
[0, 0, 1]]])
x = x.view([1, 2, 3, 3])
#torch.nn.NLLLoss()实现
log_soft = nn.LogSoftmax(dim=1)
x1=log_soft(x)
#x1:tensor([[[[-0.3133, -0.3133, -2.1269],
# [-0.6931, -1.3133, -0.6931],
# [-0.6931, -0.3133, -0.6931]],
#
# [[-1.3133, -1.3133, -0.1269],
# [-0.6931, -0.3133, -0.6931],
# [-0.6931, -1.3133, -0.6931]]]])
y = torch.LongTensor([[1, 0, 1],
[0, 0, 1],
[1, 1, 1]])
y = y.view([1, 3, 3])
loss = nn.NLLLoss()
print(loss(x1,y))
#本人编程实现
mat=torch.zeros(3,3)
for i in range(3):
for j in range(3):
mat[i,j]=-x1[0,int(y[0,i,j]),i,j]
#对应于语义分割,可理解为一张图m*n,语义分割类别为k类,X提供了每一类的log_softmax值(k*m*n,不考虑batch).
#若某像素语义分割结果为k1类,则其NLLLOSS值即为log_softmax值[k1,i,j].
#mat:
#tensor([[1.3133, 0.3133, 0.1269],
# [0.6931, 1.3133, 0.6931],
# [0.6931, 1.3133, 0.6931]])
print(torch.sum(mat)/(3*3))
#torch.nn.CrossEntropyLoss()实现
loss = nn.CrossEntropyLoss()
print(loss(x,y))
结果为:
tensor(0.7947)
tensor(0.7947)
tensor(0.7947)
此部分参考了https://blog.csdn.net/zhaowangbo/article/details/88821017。
6.交叉熵损失比二次型(如MSE)损失函数表现更好的原因
参考Understanding the difficulty of training deep feedforward neural networks
及台大李宏毅老师《机器学习》视频2017。
Figure 5, which plots the training criterion as a function of two weights for a two-layer network (one hidden layer) with hyperbolic tangent units, and a random input and target signal. There are clearly more severe plateau with the quadratic cost.
7.BCE 和Cross Entropy Loss的进一步理解
参考文件
[1] https://pytorch.org/docs/stable/nn.html#loss-functions
[2] https://www.cnblogs.com/wangguchangqing/p/12021638.html
[3] https://blog.csdn.net/qq_22210253/article/details/85222093
[4] https://blog.csdn.net/zhaowangbo/article/details/88821017
[5] https://zhuanlan.zhihu.com/p/35709485