PyTorch学习笔记(4)——自定义Loss Function（自动打印每次的梯度）

在stackoverflow上看到一个名叫Ismail_Elezi的老铁问了个自定义Loss Function的问题，它的问题在于:1）反向传播报错 2）矩阵算法使用不行 3）算法数值稳定性不行。我决定在这个例子的基础上（它应该不是torch 0.4.0的新版本，因为看他的变量还是用Variable格式），对自定义的Loss Function进行说明基于新版本（torch 0.4.0）是如何做的。

一、损失函数功能

1、Build a CNN that on the final layer has 2 neurons.（建立一个一个样本对应2个输出单元的CNN结构）

2、Transform the output of those 2 neurons (a tensor of shape n x 2) to similarity matrix. Call it X.（将输出结构变为相似性矩阵）

3、Transform the labels y into a n x n tensor (where ij-th element is a constant num0 if i and j belong to the cluster num0, 0 otherwise). Call it Y.（将标签y变为n x n的，n为Batch Size）

4、Do an elementwise multiplication of X and Y.（做element-wise乘法，也就是数组元素依次相乘）

5、Do the extra stuff, sum, some substraction etc.（再做一些额外的加减操作）

二、分析

2.1 CNN结构（2个输出单元）

这里结构就不列了，假设我们得到的结果为X，其标签为y如下所示：

import torch

X = torch.Tensor([[0.6946, 0.1328], 
                  [0.6563, 0.6873], 
                  [0.8184, 0.8047]])
X.requires_grad = True

y = torch.Tensor([1, 3, 2])
y.requires_grad = True

2.2 相似性矩阵计算


def similarity_matrix(mat):
        # get the product x * y
        # here, y = x.t()
        r = torch.mm(mat, mat.t())
        # get the diagonal elements
        diag = r.diag().unsqueeze(0)
        diag = diag.expand_as(r)
        # compute the distance matrix
        D = diag + diag.t() - 2*r
        return D.sqrt()

详细描述如下：
这里写图片描述

2.3 转换标签y的格式

def convert_y(y):

        # size和shape的用法一样
        size = y.size(0) 
        #size = y.shape[0]
        # unsqueeze_(0)是做的in-place操作，关于什么是in-place操作，在Torch教程里面的torch.Tensor章节里面有详细说明
        y = y.unsqueeze_(0).expand(size, size)
        print(y)
        return y

2.4 做一些element-wise乘法和加减等

    similarity = similarity_matrix(X)
    association = convert_y(y)

    loss_num = torch.sum(torch.mul(similarity, association))
    loss_all = torch.sum(similarity)
    loss_denum = loss_all - loss_num

    loss = loss_num/loss_denum

2.5 合并

X = torch.Tensor([[0.6946, 0.1328], 
                  [0.6563, 0.6873], 
                  [0.8184, 0.8047]])
X.requires_grad = True

y = torch.Tensor([1, 3, 2])
y.requires_grad = True




def customized_loss(X, y):
    # (x - y)^2 = x^2 - 2*x*y + y^2
    def similarity_matrix(mat):
        # get the product x * y
        # here, y = x.t()
        r = torch.mm(mat, mat.t())
        # get the diagonal elements
        diag = r.diag().unsqueeze(0)
        diag = diag.expand_as(r)
        # compute the distance matrix
        D = diag + diag.t() - 2*r
        return D.sqrt()

    def convert_y(y):

        # size和shape的用法
        size = y.size(0)
        #size = y.shape[0]
        y = y.unsqueeze(0).expand(size, size)
        print(y)


        return y

    X_similarity = similarity_matrix(X)
    association = convert_y(y)

    loss_num = torch.sum(torch.mul(X_similarity, association))
    loss_all = torch.sum(X_similarity)
    loss_denum = loss_all - loss_num
    loss = loss_num/loss_denum

    # 这个register_hook是干嘛的呢？
    loss.register_hook(lambda g: print(g))
    return loss

loss = customized_loss(X, y)

我们将模块组合起来，就可以进行相应的计算了：
这里写图片描述

因为我没仔细看他的计算逻辑，所以这里梯度为nan，有兴趣的同学可以上这个Build your own loss function in PyTorch看一下。

2.6 register_hook是干嘛的？

根据PyTorch开发者apaszke说，这个hook函数是可以在每次backward的时候打印你感兴趣的输出的梯度（这里是z）。
这里写图片描述

其结果是

这里写图片描述

官方说明：每次计算Tensor的时候，hook就会被调用，产生对应的梯度。这个函数返回一个句柄handle，通过handle.remove()就可以将hook从模块中删除。

    >>> v = torch.tensor([0., 0., 0.], requires_grad=True)
    >>> h = v.register_hook(lambda grad: grad * 2)  # double the gradient
    >>> v.backward(torch.tensor([1., 2., 3.]))
    >>> v.grad

     2
     4
     6
    [torch.FloatTensor of size (3,)]

    >>> h.remove()  # removes the hook