一、解析层的结构

首先我们通过分析官方的源码了解一下什么是层，它包含哪些结构，成员是啥等。

class Linear(nn.Module):
    def __init__(self, input_features, output_features, bias=True):
        super(Linear, self).__init__()
        self.input_features = input_features
        self.output_features = output_features
        self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
        if bias:
            self.bias = nn.Parameter(torch.Tensor(output_features))
        else:
            self.register_parameter('bias', None)
        self.weight.data.uniform_(-0.1, 0.1)
        if bias is not None:
            self.bias.data.uniform_(-0.1, 0.1)
    def forward(self, input):
        return LinearFunction.apply(input, self.weight, self.bias)

上面的源码是官方的线性层实现方式。

参数含义：

input_features是输入向量长度，output_features是输出向量的长度
input调用该类时的输入

Linear层包含两个内部参数，也就是我们说的层的权重，weight和bias。两个函数构造函数__init__和前向传播函数forward。

我们可以得到以下结论：

pytorch的层继承自nn.module类
层至少包含两个函数成员__init__和前向传播函数forward（如果自定义的操作不可导，还需要实现反向传播的backward）
如果该层含有权重，那么权重必须是nn.Parameter类型，关于Tensor和Variable（0.3版本之前）与Parameter的区别请参看之前博客。简单说就是Parameter默认需要求导，其他两个类型则不会。
可能的话，为自己定义的新层提供默认的参数初始化，以防使用过程中忘记初始化操作。

二、示例

下面我们实现一个简单的层，输入[x,y],输出为z，实现z=a*x+b*y的功能，并通过网络自动学习到参数a，b。

首先分析一下我们所要实现的功能z=a*x+b*y，其中有两个要学习的参数，a和b。假设输入为一个1*2向量，为了利用pytorch的乘法，我们将a，b合起来定义为[1,2]的向量，类型为Parmeter（所有代权重层中参数的类型）。为了更普适性完成 $z=\sum w _{i}*x _{i}$ 的任务，我们将参数的形状设为在定义时指定。

层的定义

##################################################################
####in_features->该层的形状，e.g 参数为a，b，则(1,2);为a,b,c，则(1,3)
####reset_parameters()权重默认初始化函数
####forward自己定义的操作
####input->调用该层时的输入  shape->[n,1,2]
#################################################################


class weight_pool(nn.Module):
    def __init__(self, in_features):            
        super(weight_pool, self).__init__()
        self.in_features = in_features
        self.weight = nn.Parameter(torch.Tensor(self.in_features))
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(0))
        self.weight.data.uniform_(-stdv, stdv)
        
    def forward(self, input):
        x = input * self.weight
        x = x.sum(dim=1,keepdim=True)
        return x

此时就完成了我们定义的新层。下面通过几组测试来检验我们的新层是否具有学习功能。

1.task1 学习z = x + y

由于需要学习的参数只有两个，理论上只需要两组数据就能完成学习。但是为了更普适，我们输入了五组训练数据

x,y = [1.0,2.0], [1.0,3.0], [2.0,3.0], [3.0,4.0], [9.0,10.0]
z = [3.0], [4.0], [5.0], [7.0], [19.0]

损失函数用MSELoss，学习率0.01，SGD方法，迭代10个epoch。

网络定义如下：

class MyNet(nn.Module):

    def __init__(self):
        super(MyNet, self).__init__()
        self.wpool = weight_pool((1,2))
    def forward(self, x):
        x = self.wpool(x)
        return x

下面展示前10个epoch的loss曲线图，可以看到2个epoch时网络已经接近收敛。此时学习到的参数为1.0324和0.9731，非常接近我们最理想的参数1和1。