https://arxiv.org/pdf/1512.03385.pdf#page=9&zoom=100,0,157
- 摘要
- residual 残余的
- We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
- introduction
- Recent evidence reveals that network depth is of crucial importance, and the leading results on the challenging ImageNet dataset all exploit “very deep” models, with a depth of sixteen to thirty.
- 深的网络带来的问题:
- the notorious problem of vanishing/exploding gradients 梯度弥散,梯度爆炸,batch normalization
- with the network depth increasing, accuracy gets saturated (饱和)(which might be unsurprising) and then degrades rapidly
- model的本质是mapping,将变为
- shortcut connection
- 没增加参数量,也不增加运算量
- or
- related work
- residual representation
- shortcut connection
- deep residual learning
- residual learning
- If one hypothesizes that multiple nonlinear layers can asymptotically approximate complicated functions2 , then it is equivalent to hypothesize that they can asymptotically approximate the residual functions, i.e., H(x) − x (assuming that the input and output are of the same dimensions),两种拟合方式的训练难易程度可能不同。
- The degradation problem suggests that the solvers might have difficulties in approximating identity mappings by multiple nonlinear layers.因为deeper的模型并没有更优的性能。
- Identity Mapping by Shortcuts
- a shortcut connection and element-wise addition
- 2 residual function
- 网络结构
-
-
When the dimensions increase , we consider two options: (A) The shortcut still performs identity mapping, with extra zero entries padded for increasing dimensions. This option introduces no extra parameter; (B) The projection shortcut in Eqn.(2) is used to match dimensions (done by 1×1 convolutions). For both options, when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2.
-
convolution+bn+activation
-
- residual learning
-
其他模型: