经典cnn之resnet

https://arxiv.org/pdf/1512.03385.pdf#page=9&zoom=100,0,157

  1. 摘要
    1.  residual 残余的
    2. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
  2. introduction
    1. Recent evidence reveals that network depth is of crucial importance, and the leading results on the challenging ImageNet dataset all exploit “very deep” models, with a depth of sixteen to thirty.
    2. 深的网络带来的问题:
      1. the notorious problem of vanishing/exploding gradients 梯度弥散,梯度爆炸,batch normalization
      2. with the network depth increasing, accuracy gets saturated (饱和)(which might be unsurprising) and then degrades rapidly
    3. model的本质是mapping,将F(x):=H(x)变为F(x):=H(x)-x
    4. shortcut connection
      1. 没增加参数量,也不增加运算量
      2. \\y = F(x,Wi) + x\\ F = \sigma (W_2 * \sigma (W_1*x))\\\sigma\textup{ is relu}     or      y = F(x,W_i)+W_s*x
  3. related work
    1. residual representation
    2. shortcut connection
  4. deep residual learning
    1. residual learning
      1. If one hypothesizes that multiple nonlinear layers can asymptotically approximate complicated functions2 , then it is equivalent to hypothesize that they can asymptotically approximate the residual functions, i.e., H(x) − x (assuming that the input and output are of the same dimensions),两种拟合方式的训练难易程度可能不同。
      2. The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers.因为deeper的模型并没有更优的性能。
    2. Identity Mapping by Shortcuts
      1. a shortcut connection and element-wise addition
      2. 2 residual function 
    3. 网络结构
      1. When the dimensions increase , we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions). For both options, when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2.

      2. convolution+bn+activation

  5. 其他模型:

发布了45 篇原创文章 · 获赞 1 · 访问量 8585

猜你喜欢

转载自blog.csdn.net/qq_32110859/article/details/86556415