pytorch 常用loss函数

1.L1 Loss/平均绝对误差（MAE）
- 1.1 L1 Loss/平均绝对误差（MAE）简介
- 1.2 编程实现
2. L2 Loss/均方误差（MSE）
- 2.1 L2 Loss/均方误差（MSE）简介
- 2.2 编程实现
3. SmoothL1 Loss
- 3.1 SmoothL1 Loss简介
- 3.2 编程实现
4. BCELoss和BCEWithLogitsLoss
- 4.1 BCELoss/BCEWithLogitsLoss简介
- 4.2 编程实现
5. NLL Loss（ negative log likelihood loss）和CrossEntropy Loss
- 5.1 NLL Loss/CrossEntropy Loss简介
- 5.2 编程实现
6.交叉熵损失比二次型（如MSE）损失函数表现更好的原因
7.BCE 和Cross Entropy Loss的进一步理解
参考文件

1.L1 Loss/平均绝对误差（MAE）

1.1 L1 Loss/平均绝对误差（MAE）简介

当torch.nn.L1Loss的参数reduction选择’sum’时即为L1 loss；
当选择 ‘mean’ 或’none’时，即为MAE。
公式如下：
$\cfrac{1}{n} * \sum\limits_{i = 1}^n {| { {y_i} - y_i^p} |}$
$\sum\limits_{i = 1}^n {| { {y_i} - y_i^p} |}$

1.2 编程实现

coding小栗子如下：

import torch
loss = torch.nn.L1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)

输出结果：

tensor(1.0000) #即0.1+0.2+0.3+0.4=1.0；若选择“mean”，则结果为1.0/4=0.25

2. L2 Loss/均方误差（MSE）

2.1 L2 Loss/均方误差（MSE）简介

L2 Loss、均方误差（MSE）与L1 Loss、平均绝对误差（MAE）类似，只不过它采用了预测值与目标值差值平方和的形式。

在torch.nn.MSELoss()函数中参数reduction同样有三个可选值：‘none’ , ‘mean’,‘sum’。

公式如下：

$\cfrac{1}{n} \cdot \sum\limits_{i = 1}^n { { {({y_i} - y_i^p)}^2}}$
$\sum\limits_{i = 1}^n { { {({y_i} - y_i^p)}^2}}$

2.2 编程实现

coding小栗子如下：

import torch
loss = torch.nn.MSELoss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,2],[3,4]],dtype=torch.float)
output = loss(pred, target)
print(output)

输出结果为：

tensor(0.3000) #0.01+0.04+0.09+0.16= 0.3

3. SmoothL1 Loss

3.1 SmoothL1 Loss简介

该函数是一个分段函数，在[-1,1]之间采用L2 Loss，其他区间采用L1 Loss。这样，既解决了L1 loss在0点处不可导，曲线不光滑，又解决了L2 Loss梯度爆炸的问题。

fast rcnn论文中提及，该loss“ is less sensitive to outliers than the L2 loss”。

其公式为：

$\cfrac{1}{n}\cdot\sum\limits_{i = 1}^n { {Z_i}}$

${if\left| { {x_i} - {y_i}} \right| < 1}$ 时, ${ {Z_i} = 0.5{ {({x_i} - {y_i})}^2}}$

otherwise， ${Z_i} = \left| { {x_i} - {y_i}} \right| - 0.5}$

3.2 编程实现

coding小栗子如下：

import torch
loss = torch.nn.SmoothL1Loss(reduction='sum')
pred=torch.tensor([[1.1,2.2],[3.3,4.4]],dtype=torch.float)
target = torch.tensor([[1,1],[3,6]],dtype=torch.float)
output = loss(pred, target)
print(output)

结果为：

tensor(0.4625)#(0.5*0.1**2+1.2-0.5+0.5*0.3**2+1.6-0.5)/4=0.4625

以上三种Loss函数曲线如下：
在这里插入图片描述
上图参考了https://www.cnblogs.com/wangguchangqing/p/12021638.html)

4. BCELoss和BCEWithLogitsLoss

4.1 BCELoss/BCEWithLogitsLoss简介

本文中没有特别说明，loss都采用默认的mean模式。

二者公式如下：

BCELoss:

$\cfrac{1}{n}\cdot\sum\limits_{i = 1}^n { {l_i}}$

${l_i} = - [{y_{i}} \cdot \log {x_i} + (1 - {y_i}) \cdot \log (1 - {x_i})$

直接使用BCELoss时注意， ${x_i}$ 需为0~1之间。否则会触发以下错误：

RuntimeError: Assertion `x >= 0. && x <= 1.' failed. input value should be between 0~1, but got -0.615788...

因此在BCELoss()对预测结果进行sigmoid()将其限制在(0,1)是一个不错的操作，即为BCEWithLogitsLoss()。

BCEWithLogitsLoss:

$\cfrac{1}{n}\cdot \sum\limits_{i = 1}^n { {l_i}}$

${l_i} = - [{y_{i}} \cdot \log \sigma ({x_i}) + (1 - {y_i}) \cdot \log (1 - \sigma ({x_i}))]$ ,

其中 $\sigma (x)$ 为sigmoid()函数。

对于多分类求loss()，可以把预测结果转为（全连接*one_hot类型分类结果），再求BCEWithLogitsLoss()。

4.2 编程实现

BCELoss、BCEWithLogitsLoss及个人coding实现如下：

import torch
import torch.nn as nn
input = torch.randn((2,2))
target = torch.empty((2,2)).random_(2)
#input=torch.tensor([[ 2.4480,  0.3336],
#        [-0.8614, -1.2634]])
#target =torch.tensor([[1., 1.],
#        [0., 0.]])
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()
sigmoid_input=sigmoid(input)
#采用torch.nn.BCELoss()实现
print(loss(sigmoid_input, target))
#自己编程实现
[m,n]=sigmoid_input.shape
res=-1*torch.sum((target*torch.log(sigmoid_input)+(1-target)*torch.log(1-sigmoid_input)))
# res=0
# for i in range(m):
#     for j in range(n):
#         res+=-1*(target[i,j]*torch.log(sigmoid_input[i,j])+(1-target[i,j])*torch.log(1-sigmoid_input[i,j]))
print(res/(m*n))
#直接采用torch.nn.BCEWithLogitsLoss()实现
loss = nn.BCEWithLogitsLoss()
print(loss(input, target))

结果为：

tensor(0.5947)
tensor(0.5947)
tensor(0.5947)

此部分参考了https://blog.csdn.net/qq_22210253/article/details/85222093

5. NLL Loss（ negative log likelihood loss）和CrossEntropy Loss

5.1 NLL Loss/CrossEntropy Loss简介

与BCELoss和BCEWithLogitsLoss关系类似，nn.CrossEntropyLoss()可以看做是nn.LogSoftmax() 与nn.NLLLoss() 二者先后作用的效果合成。

这里我们看到：

pred数据的格式： ${\rm{batch\_size*channel*height*width}}$
channe是数据集的类别数，如VOC数据集，加上背景为21类，channel就是21。
而label数据的格式为： ${\rm{batch\_size*height*width}}$ 。

这一点要和其他loss区分下。

公式：
$\text{LogSoftmax}(x_{i}) = \log(\cfrac{\exp(x_i) }{ \sum_j \exp(x_j)} )$

值得说明的是， nn.LogSoftmax是在dim=1进行，即channel/class维度，对应程序：
log_soft = nn.LogSoftmax(dim=1)
$\cfrac{1}{n} * \sum\limits_{i = 1}^n { {X_{i,{Y_i}}}}$

具体的实现过程可结合下面例子，深入理解下。

5.2 编程实现

nn.NLLLoss() ，本人编程及nn.CrossEntropyLoss()三种方式实现的例子如下：

import torch
import torch.nn as nn
x = torch.Tensor([[[1, 2, 1],
                  [2, 2, 1],
                  [0, 1, 1]],
                  [[0, 1, 3],
                  [2, 3, 1],
                  [0, 0, 1]]])
x = x.view([1, 2, 3, 3])
#torch.nn.NLLLoss()实现
log_soft = nn.LogSoftmax(dim=1)
x1=log_soft(x)
#x1:tensor([[[[-0.3133, -0.3133, -2.1269],
#          [-0.6931, -1.3133, -0.6931],
#          [-0.6931, -0.3133, -0.6931]],
#
#         [[-1.3133, -1.3133, -0.1269],
#          [-0.6931, -0.3133, -0.6931],
#          [-0.6931, -1.3133, -0.6931]]]])
y = torch.LongTensor([[1, 0, 1],
                      [0, 0, 1],
                      [1, 1, 1]])
y = y.view([1, 3, 3])
loss = nn.NLLLoss()
print(loss(x1,y))

#本人编程实现
mat=torch.zeros(3,3)
for i in range(3):
    for j in range(3):
        mat[i,j]=-x1[0,int(y[0,i,j]),i,j]
        
#对应于语义分割，可理解为一张图m*n，语义分割类别为k类，X提供了每一类的log_softmax值(k*m*n,不考虑batch).
#若某像素语义分割结果为k1类，则其NLLLOSS值即为log_softmax值[k1,i,j].

#mat:
#tensor([[1.3133, 0.3133, 0.1269],
#        [0.6931, 1.3133, 0.6931],
#        [0.6931, 1.3133, 0.6931]])        
print(torch.sum(mat)/(3*3))        
#torch.nn.CrossEntropyLoss()实现
loss = nn.CrossEntropyLoss()
print(loss(x,y))

结果为：

tensor(0.7947)
tensor(0.7947)
tensor(0.7947)

此部分参考了https://blog.csdn.net/zhaowangbo/article/details/88821017。

6.交叉熵损失比二次型（如MSE）损失函数表现更好的原因

参考Understanding the difficulty of training deep feedforward neural networks
及台大李宏毅老师《机器学习》视频2017。
在这里插入图片描述
Figure 5, which plots the training criterion as a function of two weights for a two-layer network (one hidden layer) with hyperbolic tangent units, and a random input and target signal. There are clearly more severe plateau with the quadratic cost.

7.BCE 和Cross Entropy Loss的进一步理解

可参考知乎损失函数 - 交叉熵损失函数。

参考文件

[1] https://pytorch.org/docs/stable/nn.html#loss-functions
[2] https://www.cnblogs.com/wangguchangqing/p/12021638.html
[3] https://blog.csdn.net/qq_22210253/article/details/85222093
[4] https://blog.csdn.net/zhaowangbo/article/details/88821017
[5] https://zhuanlan.zhihu.com/p/35709485

pytorch 常用loss函数整理篇（一）

pytorch 常用loss函数

1.L1 Loss/平均绝对误差（MAE）

1.1 L1 Loss/平均绝对误差（MAE）简介

1.2 编程实现

2. L2 Loss/均方误差（MSE）

2.1 L2 Loss/均方误差（MSE）简介

2.2 编程实现

3. SmoothL1 Loss

3.1 SmoothL1 Loss简介

3.2 编程实现

4. BCELoss和BCEWithLogitsLoss

4.1 BCELoss/BCEWithLogitsLoss简介

4.2 编程实现

5. NLL Loss（ negative log likelihood loss）和CrossEntropy Loss

5.1 NLL Loss/CrossEntropy Loss简介

5.2 编程实现

6.交叉熵损失比二次型（如MSE）损失函数表现更好的原因

7.BCE 和Cross Entropy Loss的进一步理解

参考文件

猜你喜欢