
Summary. To summarize, the Conv Layer.

  • Accepts a volume of size W 1 × H 1 × D 1 W_1\times H_1 \times D_1 W1×H1×D1
  • Requires four hyperparameters:
    • Number of filters K
    • their spatial extent F
    • the stride S,
    • the amount of zero padding P.
  • Produces a volume of sie W 2 × H 2 × D 2 W_2 \times H_2 \times D_2 W2×H2×D2 where:
    • W 2 = ( W 1 − F + 2 P ) / S + 1 W_2=(W_1- F+ 2P)/S+1 W2=(W1F+2P)/S+1
    • H 2 = ( H 1 − F + 2 P ) / S + 1 H_2=(H_1- F+ 2P)/S + 1 H2=(H1F+2P)/S+1(ie. with and height are computed eqully by symmetry)
    • D 2 = K D_2=K D2=K
  • With parameter sharin, it itroduces F ⋅ F . ⋅ D 1 F \cdot F.\cdot D_1 FF.D1 weights per filter, for atotalof ( F ⋅ F ⋅ D 1 ⋅ K F \cdot F \cdot D_1 \cdot K FFD1K ) weights and K biases.
  • In the output volume, the dth depth slice (of size W 2 × H 2 W_2 \times H_2 W2×H2)is the result of performing a valid convolution of the d-th filter over the input volume with a stride of S, and then offset by d-th bias.

W 2 = ( W 1 − F + 2 P ) / S + 1 W_2=(W_1- F+ 2P)/S+1 W2=(W1F+2P)/S+1,其中+1很容易想。比如1*1的卷积,最终还是有1个结果。

