在stackoverflow上看到一个名叫Ismail_Elezi的老铁问了个自定义Loss Function的问题,它的问题在于:1)反向传播报错 2)矩阵算法使用不行 3)算法数值稳定性不行。 我决定在这个例子的基础上(它应该不是torch 0.4.0的新版本,因为看他的变量还是用Variable格式),对自定义的Loss Function进行说明基于新版本(torch 0.4.0)是如何做的。
一、 损失函数功能
1、Build a CNN that on the final layer has 2 neurons.(建立一个一个样本对应2个输出单元的CNN结构)
2、Transform the output of those 2 neurons (a tensor of shape n x 2) to similarity matrix. Call it X.(将输出结构变为相似性矩阵)
3、Transform the labels y
into a n x n
tensor (where ij-th element is a constant num0
if i and j belong to the cluster num0
, 0 otherwise). Call it Y.(将标签y
变为n x n
的,n
为Batch Size)
4、Do an elementwise multiplication of X and Y.(做element-wise
乘法,也就是数组元素依次相乘)
5、Do the extra stuff, sum, some substraction etc.(再做一些额外的加减操作)
二、分析
2.1 CNN结构(2个输出单元)
这里结构就不列了,假设我们得到的结果为X
,其标签为y
如下所示:
import torch
X = torch.Tensor([[0.6946, 0.1328],
[0.6563, 0.6873],
[0.8184, 0.8047]])
X.requires_grad = True
y = torch.Tensor([1, 3, 2])
y.requires_grad = True
2.2 相似性矩阵计算
def similarity_matrix(mat):
# get the product x * y
# here, y = x.t()
r = torch.mm(mat, mat.t())
# get the diagonal elements
diag = r.diag().unsqueeze(0)
diag = diag.expand_as(r)
# compute the distance matrix
D = diag + diag.t() - 2*r
return D.sqrt()
详细描述如下:
2.3 转换标签y的格式
def convert_y(y):
# size和shape的用法一样
size = y.size(0)
#size = y.shape[0]
# unsqueeze_(0)是做的in-place操作,关于什么是in-place操作,在Torch教程里面的torch.Tensor章节里面有详细说明
y = y.unsqueeze_(0).expand(size, size)
print(y)
return y
2.4 做一些element-wise乘法和加减等
similarity = similarity_matrix(X)
association = convert_y(y)
loss_num = torch.sum(torch.mul(similarity, association))
loss_all = torch.sum(similarity)
loss_denum = loss_all - loss_num
loss = loss_num/loss_denum
2.5 合并
X = torch.Tensor([[0.6946, 0.1328],
[0.6563, 0.6873],
[0.8184, 0.8047]])
X.requires_grad = True
y = torch.Tensor([1, 3, 2])
y.requires_grad = True
def customized_loss(X, y):
# (x - y)^2 = x^2 - 2*x*y + y^2
def similarity_matrix(mat):
# get the product x * y
# here, y = x.t()
r = torch.mm(mat, mat.t())
# get the diagonal elements
diag = r.diag().unsqueeze(0)
diag = diag.expand_as(r)
# compute the distance matrix
D = diag + diag.t() - 2*r
return D.sqrt()
def convert_y(y):
# size和shape的用法
size = y.size(0)
#size = y.shape[0]
y = y.unsqueeze(0).expand(size, size)
print(y)
return y
X_similarity = similarity_matrix(X)
association = convert_y(y)
loss_num = torch.sum(torch.mul(X_similarity, association))
loss_all = torch.sum(X_similarity)
loss_denum = loss_all - loss_num
loss = loss_num/loss_denum
# 这个register_hook是干嘛的呢?
loss.register_hook(lambda g: print(g))
return loss
loss = customized_loss(X, y)
我们将模块组合起来,就可以进行相应的计算了:
因为我没仔细看他的计算逻辑,所以这里梯度为nan
,有兴趣的同学可以上这个Build your own loss function in PyTorch看一下。
2.6 register_hook是干嘛的?
根据PyTorch开发者apaszke说,这个hook函数是可以在每次backward的时候打印你感兴趣的输出的梯度(这里是z
)。
其结果是
官方说明:每次计算Tensor的时候,hook就会被调用,产生对应的梯度。这个函数返回一个句柄handle,通过handle.remove()就可以将hook从模块中删除。
>>> v = torch.tensor([0., 0., 0.], requires_grad=True)
>>> h = v.register_hook(lambda grad: grad * 2) # double the gradient
>>> v.backward(torch.tensor([1., 2., 3.]))
>>> v.grad
2
4
6
[torch.FloatTensor of size (3,)]
>>> h.remove() # removes the hook