Pytorch：自定义网络层

- Pytorch：自定义网络层

自定义Autograd函数

对于浅层的网络，我们可以手动的书写前向传播和反向传播过程。但是当网络变得很大时，特别是在做深度学习时，网络结构变得复杂。前向传播和反向传播也随之变得复杂，手动书写这两个过程就会存在很大的困难。幸运地是在pytorch中存在了自动微分的包，可以用来解决该问题。在使用自动求导的时候，网络的前向传播会定义一个计算图（computational graph），图中的节点是张量（tensor），两个节点之间的边对应了两个张量之间变换关系的函数。有了计算图的存在，张量的梯度计算也变得容易了些。例如， x是一个张量，其属性 x.requires_grad = True，那么 x.grad就是一个保存这个张量x的梯度的一些标量值。

最基础的自动求导操作在底层就是作用在两个张量上。前向传播函数是从输入张量到输出张量的计算过程；反向传播是输入输出张量的梯度（一些标量）并输出输入张量的梯度（一些标量）。在pytorch中我们可以很容易地定义自己的自动求导操作，通过继承torch.autograd.Function并定义forward和backward函数。

forward(): 前向传播操作。可以输入任意多的参数，任意的python对象都可以。

backward():反向传播（梯度公式）。输出的梯度个数需要与所使用的张量个数保持一致，且返回的顺序也要对应起来。

# Inherit from Function
class LinearFunction(Function):

    # Note that both forward and backward are @staticmethods
    @staticmethod
    # bias is an optional argument
    def forward(ctx, input, weight, bias=None):
        # ctx在这里类似self，ctx的属性可以在backward中调用
        ctx.save_for_backward(input, weight, bias)
        output = input.mm(weight.t())
        if bias is not None:
            output += bias.unsqueeze(0).expand_as(output)
        return output

    # This function has only a single output, so it gets only one gradient
    @staticmethod
    def backward(ctx, grad_output):
        # This is a pattern that is very convenient - at the top of backward
        # unpack saved_tensors and initialize all gradients w.r.t. inputs to
        # None. Thanks to the fact that additional trailing Nones are
        # ignored, the return statement is simple even when the function has
        # optional inputs.
        input, weight, bias = ctx.saved_tensors
        grad_input = grad_weight = grad_bias = None

        # These needs_input_grad checks are optional and there only to
        # improve efficiency. If you want to make your code simpler, you can
        # skip them. Returning gradients for inputs that don't require it is
        # not an error.
        if ctx.needs_input_grad[0]:
            grad_input = grad_output.mm(weight)
        if ctx.needs_input_grad[1]:
            grad_weight = grad_output.t().mm(input)
        if bias is not None and ctx.needs_input_grad[2]:
            grad_bias = grad_output.sum(0).squeeze(0)

        return grad_input, grad_weight, grad_bias

#调用自定义的自动求导函数
linear = LinearFunction.apply(*args) #前向传播
linear.backward()#反向传播
linear.grad_fn.apply(*args)#反向传播

对于非参数化的张量（权重是常量，不需要更新），此时可以定义为：

class MulConstant(Function):
    @staticmethod
    def forward(ctx, tensor, constant):
        # ctx is a context object that can be used to stash information
        # for backward computation
        ctx.constant = constant
        return tensor * constant

    @staticmethod
    def backward(ctx, grad_output):
        # We return as many input gradients as there were arguments.
        # Gradients of non-Tensor arguments to forward must be None.
        return grad_output * ctx.constant, None

高阶导数

grad_x =t.autograd.grad(y, x, create_graph=True)

grad_grad_x = t.autograd.grad(grad_x[0],x)

自定义Module

计算图和自动求导在定义复杂网络和求梯度的时候非常好用，但对于大型的网络，这个还是有点偏底层。在我们构建网络的时候，经常希望将计算限制在每个层之内（参数更新分层更新）。而且在TensorFlow等其他深度学习框架中都提供了高级抽象结构。因此，在pytorch中也提供了类似的包nn，它定义了一组等价于层（layer）的模块（Modules）。一个Module接受输入张量并得到输出张量，同时也会包含可学习的参数。

有时候，我们希望运用一些新的且nn包中不存在的Module。此时就需要定义自己的Module了。自定义的Module需要继承nn.Module且自定义forward函数。其中forward函数可以接受输入张量并利用其它模型或者其他自动求导操作来产生输出张量。但并不需要重写backward函数，因此nn使用了autograd。这也就意味着，需要自定义Module, 都必须有对应的autograd函数以调用其中的backward。

class Linear(nn.Module):
    def __init__(self, input_features, output_features, bias=True):
        super(Linear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features

        # nn.Parameter is a special kind of Tensor, that will get
        # automatically registered as Module's parameter once it's assigned
        # as an attribute. Parameters and buffers need to be registered, or
        # they won't appear in .parameters() (doesn't apply to buffers), and
        # won't be converted when e.g. .cuda() is called. You can use
        # .register_buffer() to register buffers.
        # (很重要！！！参数一定需要梯度！)nn.Parameters require gradients by default.
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(output_features))
        else:
            # You should always register all possible parameters, but the
            # optional ones can be None if you want.
            self.register_parameter('bias', None)

        # Not a very smart way to initialize weights
        self.weight.data.uniform_(-0.1, 0.1)
        if bias is not None:
            self.bias.data.uniform_(-0.1, 0.1)

    def forward(self, input):
        # See the autograd section for explanation of what happens here.
        return LinearFunction.apply(input, self.weight, self.bias)

    def extra_repr(self):
        # (Optional)Set the extra information about this module. You can test
        # it by printing an object of this class.
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None

Function与Module的异同

Function与Module都可以对pytorch进行自定义拓展，使其满足网络的需求，但这两者还是有十分重要的不同：

Function一般只定义一个操作，因为其无法保存参数，因此适用于激活函数、pooling等操作；Module是保存了参数，因此适合于定义一层，如线性层，卷积层，也适用于定义一个网络
Function需要定义三个方法：init, forward, backward（需要自己写求导公式）；Module：只需定义init和forward，而backward的计算由自动求导机制构成
可以不严谨的认为，Module是由一系列Function组成，因此其在forward的过程中，Function和Variable组成了计算图，在backward时，只需调用Function的backward就得到结果，因此Module不需要再定义backward。
Module不仅包括了Function，还包括了对应的参数，以及其他函数与变量，这是Function所不具备的。
module 是 pytorch 组织神经网络的基本方式。Module 包含了模型的参数以及计算逻辑。Function 承载了实际的功能，定义了前向和后向的计算逻辑。
Module 是任何神经网络的基类，pytorch 中所有模型都必需是 Module 的子类。 Module 可以套嵌，构成树状结构。一个 Module 可以通过将其他 Module 做为属性的方式，完成套嵌。
Function 是 pytorch 自动求导机制的核心类。Function 是无参数或者说无状态的，它只负责接收输入，返回相应的输出；对于反向，它接收输出相应的梯度，返回输入相应的梯度。
在调用loss.backward()时，使用的是Function子类中定义的backward()函数。

References

[1] pytorch documents
[2] pytorch tutorials
[3] Pytorch 中 Function与Module的差异与应用场景
 [4] pytorch 学习笔记之自定义 Module
[5] 探讨Pytorch中nn.Module与nn.autograd.Function的backward()函数
 [6] [Pytorch]：自定义网络层
 [7] Pytorch入门学习（八）—–自定义层的实现(甚至不可导operation的backward写法)
[8] 『PyTorch』第五弹深入理解autograd_下：函数扩展&高阶导数
 [9] 探讨pytorch中nn.Module与nn.autograd.Function的backward()函数