Part1 论文阅读与视频学习

1 ShuffleNet V1&V2

1.1 网络结构

亮点：提出了channel shuffle的思想，由GConv和DWConv组成，在移动端设备上具有更短的演算时间

channel shuffle：增强channel间的信息交流

ShuffleNet V1：

ShuffleNet结构表：

在只考虑纯理论计算的情况下，ShuffleNet需要的计算量更小：

在V1的基础上，V2指出，计算复杂度不能只看FLOPs，并提出了4条设计高效网络的准则，提出了新的block设计。

FLOPs并不是衡量计算量的直接指标，内存访问造成的时间成本，并行等级，memory，size和cost等也会影响计算量：

设计高效网络的4条准则：

当FLOPs不变时，卷积层的输入特征矩阵与输出特征矩阵相等时MAC最小（MAC即memory access cost）

当FLOPs不变时，GConv的groups增大时MAC也会增大
网络设计的碎片化程度越高，速度越慢

Element-wise操作带来的影响是不可逆的（Element-wise指对每个元素进行的操作，包含激活函数ReLU，加操作AddTensor，Addbias等，这些操作FLOPs很小但MAC很大）

总结：

使用比较平衡的卷积（输入和输出的比值尽量接近1）
注意分组卷积的运算成本
降低网络的碎片化程度
减少element-wise操作的使用

shuffleNet的结构，其中前两个是V1，后两个是V2：

（c）结构减少了碎片化操作，左分支没有单元，只对右分支进行ReLU，输入与输出channel保持一致，摒弃了分组卷积。

（d）结构中，左右分支的输出channel都和输入channel保持一致，经过拼接操作后，最后的输出channel为输入channel的两倍。

1.2 基于PyTorch搭建ShuffleNet V2

代码链接：（colab）ShuffleNetV2

# ShuffleNet V2

# 划分并组合
def channel_shuffle(x:Tensor,groups:int)->Tensor:
  batch_size,num_channels,height,width = x.size()
  channels_per_group = num_channels//groups

  # [batch_size,num_channels,height,width]->[batch_size,groups,channels_per_group,height,width]
  x = x.view(batch_size,groups,channels_per_group,height,width)

  # 转换成在内存中连续的数据
  x = torch.transpose(x,1,2).contiguous()

  # flatten
  x = x.view(batch_size,-1,height,width)

  return x


class InvertedResidual(nn.Module):
  def __init__(self,input_c:int,output_c:int,stride:int):
    super(InvertedResidual,self).__init__()

    if stride not in [1,2]:
      raise ValueError("illegal stride value.")
    self.stride = stride

    assert output_c%2==0
    branch_features = output_c//2
    # 当stride为1时，input_channel应该是branch_features的两倍
    # <<是位运算
    assert (self.stride!=1) or (input_c==branch_features<<1)

    # 右分支
    if self.stride == 2:
      self.branch1 = nn.Sequential(
        self.depthwise_conv(input_c,input_c,kernel_s=3,stride=self.stride,padding=1),
        nn.BatchNorm2d(input_c),
        nn.Conv2d(input_c,branch_features,kernel_size=1,stride=1,padding=0,bias=False),
        nn.BatchNorm2d(branch_features),
        nn.ReLU(inplace=True)
      )
    # 左分支没有操作
    else:
      self.branch1 = nn.Sequential()

    self.branch2 = nn.Sequential(
      nn.Conv2d(input_c if self.stride>1 else branch_features,branch_features,kernel_size=1,stride=1,padding=0,bias=False),
      nn.BatchNorm2d(branch_features),
      nn.ReLU(inplace=True),
      self.depthwise_conv(branch_features,branch_features,kernel_s=3,stride=self.stride,padding=1),
      nn.BatchNorm2d(branch_features),
      nn.Conv2d(branch_features,branch_features,kernel_size=1,stride=1,padding=0,bias=False),
      nn.BatchNorm2d(branch_features),
      nn.ReLU(inplace=True)
    )

  @staticmethod
  def depthwise_conv(input_c:int,output_c:int,kernel_s:int,stride:int=1,padding:int=0,bias:bool=False)->nn.Conv2d:
    return nn.Conv2d(in_channels=input_c,out_channels=output_c,kernel_size=kernel_s,
            stride=stride,padding=padding,bias=bias,groups=input_c)

  def forward(self,x:Tensor)->Tensor:
    if self.stride==1:
      x1,x2 = x.chunk(2,dim=1)
      out = torch.cat((x1,self.branch2(x2)),dim=1)
    else:
      out = torch.cat((self.branch1(x),self.branch2(x)),dim=1)

    out = channel_shuffle(out,2)

    return out


class ShuffleNetV2(nn.Module):
  def __init__(self,
        stages_repeats:List[int],
        stages_out_channels:List[int],
        num_classes:int=1000,
        inverted_residual:Callable[...,nn.Module]=InvertedResidual):
    super(ShuffleNetV2,self).__init__()

    if len(stages_repeats)!=3:
        raise ValueError("expected stages_repeats as list of 3 positive ints")
    if len(stages_out_channels)!=5:
        raise ValueError("expected stages_out_channels as list of 5 positive ints")
    self._stage_out_channels = stages_out_channels

    # input RGB image
    input_channels = 3
    output_channels = self._stage_out_channels[0]

    self.conv1 = nn.Sequential(
      nn.Conv2d(input_channels,output_channels,kernel_size=3,stride=2,padding=1,bias=False),
      nn.BatchNorm2d(output_channels),
      nn.ReLU(inplace=True)
    )
    input_channels = output_channels

    self.maxpool = nn.MaxPool2d(kernel_size=3,stride=2,padding=1)

    # Static annotations for mypy
    self.stage2: nn.Sequential
    self.stage3: nn.Sequential
    self.stage4: nn.Sequential

    stage_names = ["stage{}".format(i) for i in [2,3,4]]
    for name, repeats, output_channels in zip(stage_names,stages_repeats,self._stage_out_channels[1:]):
      seq = [inverted_residual(input_channels,output_channels,2)]
      for i in range(repeats-1):
        seq.append(inverted_residual(output_channels,output_channels,1))
      setattr(self,name,nn.Sequential(*seq))
      input_channels = output_channels

    output_channels = self._stage_out_channels[-1]
    self.conv5 = nn.Sequential(
      nn.Conv2d(input_channels,output_channels,kernel_size=1,stride=1,padding=0,bias=False),
      nn.BatchNorm2d(output_channels),
      nn.ReLU(inplace=True)
    )

    self.fc = nn.Linear(output_channels,num_classes)

  def _forward_impl(self,x:Tensor)->Tensor:
    x = self.conv1(x)
    x = self.maxpool(x)
    x = self.stage2(x)
    x = self.stage3(x)
    x = self.stage4(x)
    x = self.conv5(x)
    x = x.mean([2,3])
    x = self.fc(x)
    return x

  def forward(self,x:Tensor)->Tensor:
    return self._forward_impl(x)


def shufflenet_v2_x0_5(num_classes=1000):
  model = ShuffleNetV2(stages_repeats=[4,8,4],stages_out_channels=[24,48,96,192,1024],num_classes=num_classes)
  return model


def shufflenet_v2_x1_0(num_classes=1000):
  model = ShuffleNetV2(stages_repeats=[4,8,4],stages_out_channels=[24,116,232,464,1024],num_classes=num_classes)
  return model


def shufflenet_v2_x1_5(num_classes=1000):
  model = ShuffleNetV2(stages_repeats=[4,8,4],stages_out_channels=[24,176,352,704,1024],num_classes=num_classes)
  return model


def shufflenet_v2_x2_0(num_classes=1000):
  model = ShuffleNetV2(stages_repeats=[4,8,4],stages_out_channels=[24,244,488,976,2048],=num_classes)
  return model

2 EfficientNet V3

2.1 网络结构

亮点：同时探索了输入分辨率，网络的深度、宽度的影响

增加深度能提取到更多、更复杂的特征，但会面临梯度消失、训练困难的问题
增加宽度能获得更高细粒度的特征，更容易训练，但很难学习到更深层次的特征
增加输入网络的图像的分辨率能潜在地获得更高细粒度地特征，但准确率增益会减小，并且大分辨率图像会增加计算量
如果同时增加深度，宽度和输入图像的分辨率，就有可能会达到更好的效果

EfficientNet网络的结构是通过网络搜索技术得到的

MBConv:

先用1*1卷积升维，卷积核个数是输入channel的n倍
n=1时，不用1*1卷积，即stage2的MBConv结构都没有用于升维的1*1卷积
当且仅当输入特征矩阵与输出特征矩阵的shape相同时，shortcut连接才存在

SE模块：

SE模块由一个全局平均池化和两个全连接层组成，第一个全连接层的节点个数时输入该MBConv特征矩阵的1/4，且使用Swish激活函数，第二个全连接层节点个数为DW卷积层输出的特征矩阵的channels，且使用Sigmoid激活函数。

B0到B7的网络参数设置：

效果：

实际体验：准确率高，参数少，但占用GPU显存较多

2.2 基于PyTorch搭建EfficientNet V3

代码链接：（colab）EfficientNet V3

# EfficientNet V3

def _make_divisible(ch,divisor=8,min_ch=None):
  if min_ch is None:
    min_ch = divisor
  new_ch = max(min_ch,int(ch+divisor/2)//divisor*divisor)
  if new_ch<0.9*ch:
    new_ch += divisor
  return new_ch


def drop_path(x, drop_prob: float = 0., training: bool = False):
  if drop_prob==0. or not training:
    return x
  keep_prob = 1-drop_prob
  shape = (x.shape[0],)+(1,)*(x.ndim-1)  # 适用于多种维度
  random_tensor = keep_prob+torch.rand(shape,dtype=x.dtype,device=x.device)
  random_tensor.floor_()
  output = x.div(keep_prob)*random_tensor
  return output


class DropPath(nn.Module):
  def __init__(self,drop_prob=None):
    super(DropPath,self).__init__()
    self.drop_prob = drop_prob

  def forward(self,x):
    return drop_path(x,self.drop_prob,self.training)


class ConvBNActivation(nn.Sequential):
  def __init__(self,
        in_planes:int,
        out_planes:int,
        kernel_size:int=3,
        stride:int=1,
        groups:int=1,
        norm_layer:Optional[Callable[..., nn.Module]]=None,
        activation_layer:Optional[Callable[...,nn.Module]]=None):
    padding=(kernel_size-1)//2
    if norm_layer is None:
      norm_layer = nn.BatchNorm2d
    if activation_layer is None:
      activation_layer = nn.SiLU

    super(ConvBNActivation, self).__init__(nn.Conv2d(in_channels=in_planes,
                            out_channels=out_planes,
                            kernel_size=kernel_size,
                            stride=stride,
                            padding=padding,
                            groups=groups,
                            bias=False),
                        norm_layer(out_planes),
                        activation_layer())


class SqueezeExcitation(nn.Module):
  def __init__(self,
        input_c:int,
        expand_c:int,
        squeeze_factor:int=4):
    super(SqueezeExcitation,self).__init__()
    squeeze_c = input_c//squeeze_factor
    self.fc1 = nn.Conv2d(expand_c,squeeze_c, 1)
    self.ac1 = nn.SiLU()
    self.fc2 = nn.Conv2d(squeeze_c,expand_c,1)
    self.ac2 = nn.Sigmoid()

  def forward(self,x:Tensor)->Tensor:
    scale = F.adaptive_avg_pool2d(x,output_size=(1,1))
    scale = self.fc1(scale)
    scale = self.ac1(scale)
    scale = self.fc2(scale)
    scale = self.ac2(scale)
    return scale*x


class InvertedResidualConfig:
  def __init__(self,
        kernel:int, # 3or5
        input_c:int,
        out_c:int,
        expanded_ratio:int, # 1or6
        stride:int, # 1or2
        use_se:bool,  # True
        drop_rate:float,
        index:str,  # 1a,2a,2b...
        width_coefficient:float):
    self.input_c = self.adjust_channels(input_c,width_coefficient)
    self.kernel = kernel
    self.expanded_c = self.input_c*expanded_ratio
    self.out_c = self.adjust_channels(out_c,width_coefficient)
    self.use_se = use_se
    self.stride = stride
    self.drop_rate = drop_rate
    self.index = index

  @staticmethod
  def adjust_channels(channels:int,width_coefficient:float):
    return _make_divisible(channels*width_coefficient,8)


class InvertedResidual(nn.Module):
  def __init__(self,cnf:InvertedResidualConfig,norm_layer:Callable[...,nn.Module]):
    super(InvertedResidual,self).__init__()

    if cnf.stride not in [1,2]:
      raise ValueError("illegal stride value.")

    self.use_res_connect = (cnf.stride==1andcnf.input_c==cnf.out_c)

    layers = OrderedDict()
    activation_layer = nn.SiLU

    # expand
    if cnf.expanded_c!=cnf.input_c:
      layers.update({"expand_conv":ConvBNActivation(cnf.input_c,
                              cnf.expanded_c,
                              kernel_size=1,
                              norm_layer=norm_layer,
                              activation_layer=activation_layer)})

    # depthwise
    layers.update({"dwconv":ConvBNActivation(cnf.expanded_c,
                        cnf.expanded_c,
                        kernel_size=cnf.kernel,
                        stride=cnf.stride,
                        groups=cnf.expanded_c,
                        norm_layer=norm_layer,
                        activation_layer=activation_layer)})

    if cnf.use_se:
        layers.update({"se":SqueezeExcitation(cnf.input_c,cnf.expanded_c)})

    # project
    layers.update({"project_conv":ConvBNActivation(cnf.expanded_c,
                          cnf.out_c,
                          kernel_size=1,
                          norm_layer=norm_layer,
                          activation_layer=nn.Identity)})

    self.block = nn.Sequential(layers)
    self.out_channels = cnf.out_c
    self.is_strided = cnf.stride>1

    # 只有在使用shortcut连接时才使用dropout层
    if self.use_res_connect and cnf.drop_rate>0:
      self.dropout = DropPath(cnf.drop_rate)
    else:
      self.dropout = nn.Identity()

  def forward(self,x:Tensor)->Tensor:
    result = self.block(x)
    result = self.dropout(result)
    if self.use_res_connect:
      result += x

    return result


class EfficientNet(nn.Module):
  def __init__(self,
        width_coefficient:float,
        depth_coefficient:float,
        num_classes:int=1000,
        dropout_rate:float=0.2,
        drop_connect_rate:float=0.2,
        block:Optional[Callable[...,nn.Module]]=None,
        norm_layer:Optional[Callable[...,nn.Module]]=None):
    super(EfficientNet,self).__init__()

    # kernel_size,in_channel,out_channel,exp_ratio,strides,use_SE,drop_connect_rate,repeats
    default_cnf = [[3,32,16,1,1,True,drop_connect_rate,1],
            [3,16,24,6,2,True,drop_connect_rate,2],
            [5,24,40,6,2,True,drop_connect_rate,2],
            [3,40,80,6,2,True,drop_connect_rate,3],
            [5,80,112,6,1,True,drop_connect_rate,3],
            [5,112,192,6,2,True,drop_connect_rate,4],
            [3,192,320,6,1,True,drop_connect_rate,1]]

    def round_repeats(repeats):
      return int(math.ceil(depth_coefficient*repeats))

    if block is None:
      block = InvertedResidual

    if norm_layer is None:
      norm_layer = partial(nn.BatchNorm2d,eps=1e-3,momentum=0.1)

    adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_coefficient=width_coefficient)

    # build inverted_residual_setting
    bneck_conf = partial(InvertedResidualConfig,width_coefficient=width_coefficient)

    b = 0
    num_blocks = float(sum(round_repeats(i[-1]) for i in default_cnf))
    inverted_residual_setting = []
    for stage,args in enumerate(default_cnf):
      cnf = copy.copy(args)
      for i in range(round_repeats(cnf.pop(-1))):
        if i>0:
          cnf[-3] = 1
          cnf[1] = cnf[2]

        cnf[-1] = args[-2]*b/num_blocks  #更新dropout
        index = str(stage+1)+chr(i+97)  #1a,2a,2b...
        inverted_residual_setting.append(bneck_conf(*cnf,index))
        b += 1

    # create layers
    layers = OrderedDict()

    # first conv
    layers.update({"stem_conv":ConvBNActivation(in_planes=3,
                          out_planes=adjust_channels(32),
                          kernel_size=3,
                          stride=2,
                          norm_layer=norm_layer)})

    # building inverted residual blocks
    for cnf in inverted_residual_setting:
      layers.update({cnf.index: =block(cnf,norm_layer)})

    # build top
    last_conv_input_c = inverted_residual_setting[-1].out_c
    last_conv_output_c = adjust_channels(1280)
    layers.update({"top":ConvBNActivation(in_planes=last_conv_input_c,
                      out_planes=last_conv_output_c,
                      kernel_size=1,
                      norm_layer=norm_layer)})

    self.features = nn.Sequential(layers)
    self.avgpool = nn.AdaptiveAvgPool2d(1)

    classifier = []
    if dropout_rate>0:
      classifier.append(nn.Dropout(p=dropout_rate,inplace=True))
    classifier.append(nn.Linear(last_conv_output_c,num_classes))
    self.classifier = nn.Sequential(*classifier)

    # 权重初始化
    for m in self.modules():
      if isinstance(m,nn.Conv2d):
        nn.init.kaiming_normal_(m.weight,mode="fan_out")
        if m.bias is not None:
          nn.init.zeros_(m.bias)
      elif isinstance(m,nn.BatchNorm2d):
        nn.init.ones_(m.weight)
        nn.init.zeros_(m.bias)
      elif isinstance(m,nn.Linear):
        nn.init.normal_(m.weight,0,0.01)
        nn.init.zeros_(m.bias)

  def _forward_impl(self,x:Tensor)->Tensor:
    x = self.features(x)
    x = self.avgpool(x)
    x = torch.flatten(x,1)
    x = self.classifier(x)

    return x

  def forward(self,x:Tensor)->Tensor:
    return self._forward_impl(x)


def efficientnet_b0(num_classes=1000):
  # 224x224
  return EfficientNet(width_coefficient=1.0,depth_coefficient=1.0,dropout_rate=0.2,num_classes=num_classes)


def efficientnet_b1(num_classes=1000):
  # 240x240
  return EfficientNet(width_coefficient=1.0,depth_coefficient=1.1,dropout_rate=0.2,num_classes=num_classes)


def efficientnet_b2(num_classes=1000):
  # 260x260
  return EfficientNet(width_coefficient=1.1,depth_coefficient=1.2,dropout_rate=0.3,num_classes=num_classes)


def efficientnet_b3(num_classes=1000):
  # 300x300
  return EfficientNet(width_coefficient=1.2,depth_coefficient=1.4,dropout_rate=0.3,num_classes=num_classes)


def efficientnet_b4(num_classes=1000):
  # 380x380
  return EfficientNet(width_coefficient=1.4,depth_coefficient=1.8,dropout_rate=0.4,num_classes=num_classes)


def efficientnet_b5(num_classes=1000):
  # 456x456
  return EfficientNet(width_coefficient=1.6,depth_coefficient=2.2,dropout_rate=0.4,num_classes=num_classes)


def efficientnet_b6(num_classes=1000):
  # 528x528
  return EfficientNet(width_coefficient=1.8,depth_coefficient=2.6,dropout_rate=0.5,num_classes=num_classes)


def efficientnet_b7(num_classes=1000):
  # 600x600
  return EfficientNet(width_coefficient=2.0,depth_coefficient=3.1,dropout_rate=0.5,num_classes=num_classes)

3 Transformer里的multi-head self-attention

Transformer：最开始是针对NLP领域提出的，替代了之前的无法实现并行、记忆长度短的时序网络。在没有硬件限制的情况下，Transformer可以做到无限长度的记忆。

进行点乘得到的数值很大，当经过softmax后，梯度会变得很小

self-attention：

multi-head self-attention：

对head进行拼接：

之后要对拼接后的矩阵进行处理，保证输入输出的向量长度保持不变。多头注意力机制能够联合来自不同head部分学习得到的信息，生成更有意义的特征，多头的本质是多个独立的attention计算，有集成的作用，可以防止过拟合。

Part2 代码练习

1 使用VGG模型进行猫狗大战

1.1 调用库

import numpy as np
import matplotlib.pyplot as plt
import os
import torch
import torch.nn as nn
import torchvision
from torchvision import models,transforms,datasets
import time
import json


# 判断是否存在GPU设备
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using gpu: %s '%torch.cuda.is_available())

1.2 下载数据并解压

# 下载数据
! wget http://fenggao-image.stor.sinaapp.com/dogscats.zip
! unzip dogscats.zip

1.3 数据处理

# 数据处理
normalize = transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225])

vgg_format = transforms.Compose([transforms.CenterCrop(224),transforms.ToTensor(),normalize])

data_dir = './dogscats'

dsets = {x:datasets.ImageFolder(os.path.join(data_dir,x),vgg_format) for x in ['train','valid']}
dset_sizes = {x:len(dsets[x]) for x in ['train','valid']}
dset_classes = dsets['train'].classes

# 查看dsets的一些属性
print(dsets['train'].classes)
print(dsets['train'].class_to_idx)
print(dsets['train'].imgs[:5])
print('dset_sizes: ', dset_sizes)

loader_train = torch.utils.data.DataLoader(dsets['train'], batch_size=64, shuffle=True, num_workers=6)
loader_valid = torch.utils.data.DataLoader(dsets['valid'], batch_size=5, shuffle=False, num_workers=6)

count = 1
for data in loader_valid:
  print(count,end='\n')
  if count==1:
    inputs_try,labels_try = data
  count+=1

print(labels_try)
print(inputs_try.shape)

# 显示图片
def imshow(inp, title=None):
  inp = inp.numpy().transpose((1,2,0))
  mean = np.array([0.485,0.456,0.406])
  std = np.array([0.229,0.224,0.225])
  inp = np.clip(std*inp+mean,0,1)
  plt.imshow(inp)
  if title is not None:
    plt.title(title)
  plt.pause(0.001)

# 显示labels_try的5张图片，即valid里第一个batch的5张图片
out = torchvision.utils.make_grid(inputs_try)
imshow(out,title=[dset_classes[x] for x in labels_try])

1.4 创建模型

# 创建模型

model_vgg = models.vgg16(pretrained=True)

with open('./imagenet_class_index.json') as f:
  class_dict = json.load(f)
dic_imagenet = [class_dict[str(i)][1] for i in range(len(class_dict))]

inputs_try,labels_try = inputs_try.to(device),labels_try.to(device)
model_vgg = model_vgg.to(device)

outputs_try = model_vgg(inputs_try)

print(outputs_try)
print(outputs_try.shape)

m_softm = nn.Softmax(dim=1)
probs = m_softm(outputs_try)
vals_try,pred_try = torch.max(probs,dim=1)

print('prob sum: ',torch.sum(probs,1))
print('vals_try: ',vals_try)
print('pred_try: ',pred_try)

print([dic_imagenet[i] for i in pred_try.data])
imshow(torchvision.utils.make_grid(inputs_try.data.cpu()), 
  title=[dset_classes[x] for x in labels_try.data.cpu()])

# 修改最后一层，冻结前面层的参数

print(model_vgg)

model_vgg_new = model_vgg;

for param in model_vgg_new.parameters():
  param.requires_grad = False
model_vgg_new.classifier._modules['6'] = nn.Linear(4096,2)
model_vgg_new.classifier._modules['7'] = torch.nn.LogSoftmax(dim=1)

model_vgg_new = model_vgg_new.to(device)

print(model_vgg_new.classifier)

展示模型：

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)
Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace=True)
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=4096, out_features=2, bias=True)
  (7): LogSoftmax(dim=1)
)

1.5 训练与测试

# 训练并测试全连接层

criterion = nn.NLLLoss()

# 学习率
lr = 0.001

# 随机梯度下降
optimizer_vgg = torch.optim.SGD(model_vgg_new.classifier[6].parameters(),lr=lr)

def train_model(model,dataloader,size,epochs=1,optimizer=None):
  model.train()
  
  for epoch in range(epochs):
    running_loss = 0.0
    running_corrects = 0
    count = 0
    for inputs,classes in dataloader:
      inputs = inputs.to(device)
      classes = classes.to(device)
      outputs = model(inputs)
      loss = criterion(outputs,classes)           
      optimizer = optimizer
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()
      _,preds = torch.max(outputs.data,1)
      # statistics
      running_loss += loss.data.item()
      running_corrects += torch.sum(preds==classes.data)
      count += len(inputs)
      print('Training: No. ',count,' process ... total: ',size)
    epoch_loss = running_loss/size
    epoch_acc = running_corrects.data.item()/size
    print('Loss: {:.4f} Acc: {:.4f}'.format(epoch_loss,epoch_acc))
        
        
# 模型训练
train_model(model_vgg_new,loader_train,size=dset_sizes['train'],epochs=1,optimizer=optimizer_vgg)

# 测试

def test_model(model,dataloader,size):
  model.eval()
  predictions = np.zeros(size)
  all_classes = np.zeros(size)
  all_proba = np.zeros((size,2))
  i = 0
  running_loss = 0.0
  running_corrects = 0
  for inputs,classes in dataloader:
    inputs = inputs.to(device)
    classes = classes.to(device)
    outputs = model(inputs)
    loss = criterion(outputs,classes)           
    _,preds = torch.max(outputs.data,1)
    # statistics
    running_loss += loss.data.item()
    running_corrects += torch.sum(preds==classes.data)
    predictions[i:i+len(classes)] = preds.to('cpu').numpy()
    all_classes[i:i+len(classes)] = classes.to('cpu').numpy()
    all_proba[i:i+len(classes),:] = outputs.data.to('cpu').numpy()
    i += len(classes)
    print('Testing: No. ',i,' process ... total: ',size)        
  epoch_loss = running_loss/size
  epoch_acc = running_corrects.data.item()/size
  print('Loss: {:.4f} Acc: {:.4f}'.format(epoch_loss,epoch_acc))
  return predictions,all_proba,all_classes
  
predictions,all_proba,all_classes = test_model(model_vgg_new,loader_valid,size=dset_sizes['valid'])

1.6 可视化

2 AI艺术鉴赏挑战赛题

代码链接：

此处尝试使用resnet进行，主要代码如下：

num_workers = 2
batch_size = 32

net = models.resnet50(pretrained=True)
net.fc = nn.Linear(net.fc.in_features,49,bias=True)
net = net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(),lr=0.001)

train_data = PDataset(train_dir,transform=transform)
train_dataloader = DataLoader(dataset=train_data,shuffle=True,num_workers=num_workers,batch_size=batch_size,pin_memory=True)

for epoch in range(30):  # 重复多轮训练
    for i,(img,label) in enumerate(train_dataloader):
      img = img.to(device)
      labels = label.to(device)
      # 优化器梯度归零
      optimizer.zero_grad()
      # 正向传播+反向传播+优化 
      outputs = net(img)
      loss = criterion(outputs,labels)
      loss.backward()
      optimizer.step()
    # 输出统计信息  
    print('Epoch: %d loss: %.3f' %(epoch+1,loss.item()))

print('Finished Training')

import pandas as pd

test_imgs = os.listdir(test_dir)

id_list = []
label_list = []

for img in test_imgs:
  id = img.split('.')[0]
  img = Image.open(test_dir+img).convert('RGB')
  img = transform(img).to(device)
  img = img.unsqueeze(0)
  label = net(img)
  i = 0
  # print(label[0])
  max = label[0][0]
  max_id = 0
  for value in label[0]:
    if value > max:
      max = value
      max_id = i
    i+=1
  print(max_id)
  id_list.append(id)
  label_list.append(max_id)

dataframe = pd.DataFrame(
        {'id': id_list, 'nameid': label_list})
dataframe.to_csv(r"results.csv", sep=',')

resnet50的训练结果：

生成csv并提交后的判定结果：

其中，results1是resnet50训练15轮的结果，result2是resnet50训练50轮的结果，效果都特别差。。。。这可能和训练轮次过少有关，因为该赛题中有49个分类，属于多分类问题，往往需要更好的网络结构和更小的loss才能达到理想的效果。除此之外，根据提供的材料来看，训练集和测试集均趋向于长尾分布，这容易导致训练模型把所有图片都归为同一类的情况，并且观察csv文件发现，绝大多数图片都被归为了3或16号类，而3号和16号确实含有最多的样本量，说明模型投机取巧的现象真的发生了。

接下来本人学习了奖金赛第一名的选手提供的代码方案，这里使用的主干网络为EfficientNet B3，并且记录下了5组不同loss下的测试结果，使用投票的方式决定每个图片属于哪一类，我认为这种投票机制是该代码方案的亮点，它可以综合5个训练成果的优点，取长补短，很巧妙，并且这种主干网络参数量较小，主打运算效率，不会因为评判5次结果而耗费大量时间。

再观察第二名和第三名的代码方案，发现他们也使用了这种投票机制，其中第二名采用resnet200作为主干网络，第三名分别使用了efficientnet b3和resnext50两种网络。个人认为，当主干网络结构简洁、计算效率高时，可以使用这种投票机制提高识别准确率。

【OUC深度学习入门】第5周学习记录：ShuffleNet&EfficientNet&迁移学习