Learning to Navigate for Fine-grained Classification

创新点

Training method of Navigator network. For an input image, the feature extrac-
tor extracts its deep feature map, then the feature map is fed into Navigator network
to compute the informativeness of all regions. We choose top-M (here M = 3 for expla-
nation) informative regions after NMS and denote their informativeness as ${I 1 ,I 2 ,I 3 }$ .Then we crop the regions from the full image, resize them to the pre-defined size and
feed them into Teacher network, then we get the confidences ${C 1 ,C 2 ,C 3 }.$ We optimize Navigator network to make ${I_1, I_2, I_3}$ and ${C_1, C_2, C_3}$ having the same order
训练 N-net 的方法，对于一张输入图片，首先用 resnet 做特征提取器，提取图像的 feature map
然后,上述提取的 feature map 被喂到 N-net 中，去计算所有区域的 informativeness（用一个 rpn-net 提取的特征结果作为 informativeness，是否合理）
基于 informativeness 进行 NMS，选出 topn 个 regions
从原图中把 topn informativeness region 抠出来，缩放到预定义的尺寸（224 x 224）
把抠出来的区域喂到 T-net 中，得到置信度 C（用一个 2048 x num_class）的全连接层实现
通过一个 rank loss 优化 N-net 使 ${I_1, I_2, I_3}$ 和 ${C_1, C-2, C_3}$ 有相同的顺序,确保 navigate 到信息丰富的区域（信息丰富和有区分性局部能否起到相同的作用？）

- 在这里插入图片描述

concatenate topn informative region feature map with input image‘s feature map，用( ( (2048 x (n+1)) ，num_class) 的全连接层实现分类