Summary. To summarize, the Conv Layer.
- Accepts a volume of size W 1 × H 1 × D 1 W_1\times H_1 \times D_1 W1×H1×D1
- Requires four hyperparameters:
- Number of filters K
- their spatial extent F
- the stride S,
- the amount of zero padding P.
- Produces a volume of sie W 2 × H 2 × D 2 W_2 \times H_2 \times D_2 W2×H2×D2 where:
- W 2 = ( W 1 − F + 2 P ) / S + 1 W_2=(W_1- F+ 2P)/S+1 W2=(W1−F+2P)/S+1
- H 2 = ( H 1 − F + 2 P ) / S + 1 H_2=(H_1- F+ 2P)/S + 1 H2=(H1−F+2P)/S+1(ie. with and height are computed eqully by symmetry)
- D 2 = K D_2=K D2=K
- With parameter sharin, it itroduces F ⋅ F . ⋅ D 1 F \cdot F.\cdot D_1 F⋅F.⋅D1 weights per filter, for atotalof ( F ⋅ F ⋅ D 1 ⋅ K F \cdot F \cdot D_1 \cdot K F⋅F⋅D1⋅K ) weights and K biases.
- In the output volume, the dth depth slice (of size W 2 × H 2 W_2 \times H_2 W2×H2)is the result of performing a valid convolution of the d-th filter over the input volume with a stride of S, and then offset by d-th bias.
W 2 = ( W 1 − F + 2 P ) / S + 1 W_2=(W_1- F+ 2P)/S+1 W2=(W1−F+2P)/S+1,其中+1很容易想。比如1*1的卷积,最终还是有1个结果。