基于TensorFlow的SSD车辆检测-3

百度云链接总是挂掉，大家实在有需要发我邮箱吧[email protected]

此系列博客是用来学习Tensorflow和Python的，由于是新手上车，如有错误之处希望大家不吝指出。

整个项目可以从百度云下载：
链接：https://pan.baidu.com/s/1f2JPJpE7m5M2kSifMP0-Lw 密码：9p8v

谷歌云盘：

https://drive.google.com/open?id=1_IpPGwND0D0HPCJ9zNAKAInv5J5GaB2g

三. label制备以及batch数据供给

本环节主要包含下面三块内容：

一些关于anchor生成的常量**
介绍如何通过原始的标注框来生成计算Loss所需的label以及mask;
如果在训练阶段批量的提供训练数据，并包含shuffle等操作；

1.一些关于anchor生成的常量

在constants.py文件中定义了一些关于anchor的常量：

# coding=utf-8

# to pre-define some constant variables

# SSD网络中6个预测分支中feature map的大小
feature_size = [38, 19, 10, 5, 3, 1]

# 300 / feature_size：feature map中像素在原图中对应的感受野比例
anchor_steps = [8, 16, 30, 60, 100, 300]

# 6个预测分支分别对应的anchor类别数。注意：SSD原文中是[4 6 6 6 4 4 ]，但是由于KITTI中图片缩放后导致存在更多的小目标，因此为了提高小目标的检测率，将第一个分支的anchor的种类由4提高到6.
anchors_num = [6, 6, 6, 6, 4, 4]

# 则anchor的总数量也由原文中的8732提高到11620
all_anchors_num = 11620

# 6个分支所使用的anchor的长宽比，注意长宽比1:1的anchor有两种，但大小不一
anchors_ratio = [[1, 1, 2, 0.5, 3, 1./3],
                 [1, 1, 2, 0.5, 3, 1./3],
                 [1, 1, 2, 0.5, 3, 1./3],
                 [1, 1, 2, 0.5, 3, 1./3],
                 [1, 1, 2, 0.5],
                 [1, 1, 2, 0.5]]

# 按照论文规则设计的anchor大小：最小0.07，最大的0.87，然后等差分配，则6种anchor的大小占原图的百分比依次为[0.07 0.23 ... 0.87]
# 特别的，对于长宽比1:1的anchor，再增加一种稍大的尺寸
# the first: ratio=1, sqrt(S_k*S_(k+1))
# the second: 0.07+(k-1)*(0.87-0.1)/(6-1), k=1...6
"""anchors_scales = [[0.13, 0.07],
                  [0.30, 0.23],
                  [0.46, 0.39],
                  [0.62, 0.55],
                  [0.79, 0.71],
                  [0.95, 0.87]]"""

# 300*anchors_scales
anchors_size = [[39, 21],
                [90, 69],
                [138, 108],
                [186, 165],
                [237, 213],
                [285, 261]]

2.如何生成label以及mask

我生成label的方法比较呆板：
- （1）首先利用genBatch.py中的gen_anchors函数生成所有可能的anchors，维度为11620*4（坐标格式为[x y w h]）;
- （2）然后利用genBatch.py中的gen_labels循环处理每一个标注的车辆的bounding box：每一个bounding box都去和所有anchors计算IOU，如果和某些anchor的IOU大于一定阈值，就将该anchor的属性label置为1，并按照下式计算相应的bounding box offset:

这里写图片描述

相应的计算函数如下：

# compute normalized offset between boxG(ground truth) and boxD(default anchor) [x,y,w,h]
def compute_offset(boxG, boxD):
    offset = np.zeros([1, 4])
    # offset_x, offset_dy
    offset[0, :2] = [(boxG[0] - boxD[0]) / boxD[2], (boxG[1] - boxD[1]) / boxD[3]]
    # offset_w, offset_h
    offset[0, 2:] = np.log([boxG[2] / boxD[2], boxG[3] / boxD[3]])
    return offset

mask的制作就显得比较简单了，具体定义已经在上一节中介绍过了，相应的代码如下:

# generate two masks to weights different parts in the final ssd loss
def gen_masks(cls_label, neg_weight=3.0, reg_weight=1.0):
    pos_mask = cls_label[:, 1]
    neg_mask = 1. - pos_mask
    pos_num = np.sum(pos_mask)
    neg_num = np.sum(neg_mask)

    if pos_num > 0:
        pos_mask = pos_mask / pos_num
    if neg_num > 0:
        neg_mask = neg_mask / neg_num * neg_weight

    return pos_mask + neg_mask, pos_mask * reg_weight

（3）需要注意的是：当有多个标注的boundingbox与同一个anchor的IOU大于一定阈值时，我们只选择IOU最大的那个标注。

3.如何供给Batch数据

Batch的数据供给主要考虑到在训练过程中，自动的为训练提供正确的数据以及对应的label，主要考虑的因素有：batch_Szie，是否shuffle, 是否进行数据扩张以及各种数据扩张的比例等等。

为此，我们定义了如下类：

class GenBatch:
    def __init__(self, image_path, label_path,
                 batch_size, new_w, new_h, is_color=True, is_shuffle=True):
        self.image_path, self.label_path = image_path, label_path,
        self.batch_size, self.new_w, self.new_h, self.is_color, self.is_shuffle = \
            batch_size, new_w, new_h, is_color, is_shuffle

        self.readPos = 0

        # read KITTI
        self.image_list = readKITTI.get_filelist(image_path, '.png')
        self.bbox_list = readKITTI.get_bboxlist(label_path, self.image_list)
        if len(self.image_list) > 0 and len(self.image_list) == len(self.bbox_list):
            print("The amount of images is %d" % (len(self.image_list)))

            self.initOK = True
            self.all_anchors = gen_anchors()

            # init the outputs
            self.batch_image = np.zeros([batch_size, new_h, new_w, 3 if self.is_color else 1], dtype=np.float32)
            self.batch_cls_label = np.zeros([batch_size * all_anchors_num, 2], dtype=np.float32)
            self.batch_reg_label = np.zeros([batch_size * all_anchors_num, 4], dtype=np.float32)
            self.batch_cls_mask = np.zeros([batch_size * all_anchors_num], dtype=np.float32)
            self.batch_reg_mask = np.zeros([batch_size * all_anchors_num], dtype=np.float32)
        else:
            print("The amount of images is %d, while the amount of "
                  "corresponding label is %d" % (len(self.image_list), len(self.bbox_list)))
            self.initOK = False

    # generate a new batch
    # mirror_ratio and crop_ratio are used to control the image augmentation,
    # the default zeros means no images augmentation
    # cls_pos_weight and reg_weight are used to generate a mask to compute the final SSD loss
    def nextbatch(self, mirror_ratio=0.0, crop_ratio=0.0):
        if self.initOK is False:
            print("NO successful initiation!.")
            return []
        for i in range(self.batch_size):
            # if a epoch is completed
            if self.readPos >= len(self.image_list)-1:
                self.readPos = 0
                if self.is_shuffle is True:
                    r_seed = random.random()
                    random.seed(r_seed)
                    random.shuffle(self.image_list)
                    random.seed(r_seed)
                    random.shuffle(self.bbox_list)
                    print('Shuffle the data successfully.\n')

            img = cv2.imread(self.image_path + self.image_list[self.readPos])

            bbox = self.bbox_list[self.readPos]

            self.readPos += 1

            # randomly crop under a specified probability
            if crop_ratio > 0 and random.random() < crop_ratio:
                img, bbox = imAugment.imcrop(img, bbox, min(self.new_w, self.new_h))

            # check the input image's size and color
            img, bbox = imAugment.imresize(img, bbox, self.new_w, self.new_h, self.is_color)

            # horizontally flip the input image under a specified probability
            if mirror_ratio > 0 and random.random() < mirror_ratio:
                img, bbox = imAugment.immirror(img, bbox)

            # generate processed labels
            cls_label, reg_label = gen_labels(bbox, self.all_anchors)

            # generate masks
            cls_mask, reg_mask = gen_masks(cls_label)

            self.batch_image[i, :, :, :] = img.astype(np.float32)
            self.batch_cls_label[i*all_anchors_num:(i+1)*all_anchors_num, :] = cls_label
            self.batch_reg_label[i*all_anchors_num:(i+1)*all_anchors_num, :] = reg_label
            self.batch_cls_mask[i*all_anchors_num:(i+1)*all_anchors_num] = cls_mask
            self.batch_reg_mask[i*all_anchors_num:(i+1)*all_anchors_num] = reg_mask

        return self.batch_image, self.batch_cls_label, self.batch_reg_label, self.batch_cls_mask, self.batch_reg_mask