版权声明:本文为博主原创文章,转载请注明出处 https://blog.csdn.net/shuzfan/article/details/79966568
百度云链接总是挂掉,大家实在有需要发我邮箱吧[email protected]
此系列博客是用来学习Tensorflow和Python的,由于是新手上车,如有错误之处希望大家不吝指出。
整个项目可以从百度云下载:
链接:https://pan.baidu.com/s/1f2JPJpE7m5M2kSifMP0-Lw 密码:9p8v
谷歌云盘:
https://drive.google.com/open?id=1_IpPGwND0D0HPCJ9zNAKAInv5J5GaB2g
三. label制备以及batch数据供给
本环节主要包含下面三块内容:
- 一些关于anchor生成的常量**
- 介绍如何通过原始的标注框来生成计算Loss所需的label以及mask;
- 如果在训练阶段批量的提供训练数据,并包含shuffle等操作;
1.一些关于anchor生成的常量
在constants.py文件中定义了一些关于anchor的常量:
# coding=utf-8
# to pre-define some constant variables
# SSD网络中6个预测分支中feature map的大小
feature_size = [38, 19, 10, 5, 3, 1]
# 300 / feature_size:feature map中像素在原图中对应的感受野比例
anchor_steps = [8, 16, 30, 60, 100, 300]
# 6个预测分支分别对应的anchor类别数。注意:SSD原文中是[4 6 6 6 4 4 ],但是由于KITTI中图片缩放后导致存在更多的小目标,因此为了提高小目标的检测率,将第一个分支的anchor的种类由4提高到6.
anchors_num = [6, 6, 6, 6, 4, 4]
# 则anchor的总数量也由原文中的8732提高到11620
all_anchors_num = 11620
# 6个分支所使用的anchor的长宽比,注意长宽比1:1的anchor有两种,但大小不一
anchors_ratio = [[1, 1, 2, 0.5, 3, 1./3],
[1, 1, 2, 0.5, 3, 1./3],
[1, 1, 2, 0.5, 3, 1./3],
[1, 1, 2, 0.5, 3, 1./3],
[1, 1, 2, 0.5],
[1, 1, 2, 0.5]]
# 按照论文规则设计的anchor大小:最小0.07,最大的0.87,然后等差分配,则6种anchor的大小占原图的百分比依次为[0.07 0.23 ... 0.87]
# 特别的,对于长宽比1:1的anchor,再增加一种稍大的尺寸
# the first: ratio=1, sqrt(S_k*S_(k+1))
# the second: 0.07+(k-1)*(0.87-0.1)/(6-1), k=1...6
"""anchors_scales = [[0.13, 0.07],
[0.30, 0.23],
[0.46, 0.39],
[0.62, 0.55],
[0.79, 0.71],
[0.95, 0.87]]"""
# 300*anchors_scales
anchors_size = [[39, 21],
[90, 69],
[138, 108],
[186, 165],
[237, 213],
[285, 261]]
2.如何生成label以及mask
我生成label的方法比较呆板:
- (1)首先利用genBatch.py中的gen_anchors函数生成所有可能的anchors,维度为11620*4(坐标格式为[x y w h]);
- (2)然后利用genBatch.py中的gen_labels循环处理每一个标注的车辆的bounding box:每一个bounding box都去和所有anchors计算IOU,如果和某些anchor的IOU大于一定阈值,就将该anchor的属性label置为1,并按照下式计算相应的bounding box offset:
相应的计算函数如下:
# compute normalized offset between boxG(ground truth) and boxD(default anchor) [x,y,w,h]
def compute_offset(boxG, boxD):
offset = np.zeros([1, 4])
# offset_x, offset_dy
offset[0, :2] = [(boxG[0] - boxD[0]) / boxD[2], (boxG[1] - boxD[1]) / boxD[3]]
# offset_w, offset_h
offset[0, 2:] = np.log([boxG[2] / boxD[2], boxG[3] / boxD[3]])
return offset
mask的制作就显得比较简单了,具体定义已经在上一节中介绍过了,相应的代码如下:
# generate two masks to weights different parts in the final ssd loss
def gen_masks(cls_label, neg_weight=3.0, reg_weight=1.0):
pos_mask = cls_label[:, 1]
neg_mask = 1. - pos_mask
pos_num = np.sum(pos_mask)
neg_num = np.sum(neg_mask)
if pos_num > 0:
pos_mask = pos_mask / pos_num
if neg_num > 0:
neg_mask = neg_mask / neg_num * neg_weight
return pos_mask + neg_mask, pos_mask * reg_weight
- (3)需要注意的是:当有多个标注的boundingbox与同一个anchor的IOU大于一定阈值时,我们只选择IOU最大的那个标注。
3.如何供给Batch数据
Batch的数据供给主要考虑到在训练过程中,自动的为训练提供正确的数据以及对应的label,主要考虑的因素有:batch_Szie,是否shuffle, 是否进行数据扩张以及各种数据扩张的比例等等。
为此,我们定义了如下类:
class GenBatch:
def __init__(self, image_path, label_path,
batch_size, new_w, new_h, is_color=True, is_shuffle=True):
self.image_path, self.label_path = image_path, label_path,
self.batch_size, self.new_w, self.new_h, self.is_color, self.is_shuffle = \
batch_size, new_w, new_h, is_color, is_shuffle
self.readPos = 0
# read KITTI
self.image_list = readKITTI.get_filelist(image_path, '.png')
self.bbox_list = readKITTI.get_bboxlist(label_path, self.image_list)
if len(self.image_list) > 0 and len(self.image_list) == len(self.bbox_list):
print("The amount of images is %d" % (len(self.image_list)))
self.initOK = True
self.all_anchors = gen_anchors()
# init the outputs
self.batch_image = np.zeros([batch_size, new_h, new_w, 3 if self.is_color else 1], dtype=np.float32)
self.batch_cls_label = np.zeros([batch_size * all_anchors_num, 2], dtype=np.float32)
self.batch_reg_label = np.zeros([batch_size * all_anchors_num, 4], dtype=np.float32)
self.batch_cls_mask = np.zeros([batch_size * all_anchors_num], dtype=np.float32)
self.batch_reg_mask = np.zeros([batch_size * all_anchors_num], dtype=np.float32)
else:
print("The amount of images is %d, while the amount of "
"corresponding label is %d" % (len(self.image_list), len(self.bbox_list)))
self.initOK = False
# generate a new batch
# mirror_ratio and crop_ratio are used to control the image augmentation,
# the default zeros means no images augmentation
# cls_pos_weight and reg_weight are used to generate a mask to compute the final SSD loss
def nextbatch(self, mirror_ratio=0.0, crop_ratio=0.0):
if self.initOK is False:
print("NO successful initiation!.")
return []
for i in range(self.batch_size):
# if a epoch is completed
if self.readPos >= len(self.image_list)-1:
self.readPos = 0
if self.is_shuffle is True:
r_seed = random.random()
random.seed(r_seed)
random.shuffle(self.image_list)
random.seed(r_seed)
random.shuffle(self.bbox_list)
print('Shuffle the data successfully.\n')
img = cv2.imread(self.image_path + self.image_list[self.readPos])
bbox = self.bbox_list[self.readPos]
self.readPos += 1
# randomly crop under a specified probability
if crop_ratio > 0 and random.random() < crop_ratio:
img, bbox = imAugment.imcrop(img, bbox, min(self.new_w, self.new_h))
# check the input image's size and color
img, bbox = imAugment.imresize(img, bbox, self.new_w, self.new_h, self.is_color)
# horizontally flip the input image under a specified probability
if mirror_ratio > 0 and random.random() < mirror_ratio:
img, bbox = imAugment.immirror(img, bbox)
# generate processed labels
cls_label, reg_label = gen_labels(bbox, self.all_anchors)
# generate masks
cls_mask, reg_mask = gen_masks(cls_label)
self.batch_image[i, :, :, :] = img.astype(np.float32)
self.batch_cls_label[i*all_anchors_num:(i+1)*all_anchors_num, :] = cls_label
self.batch_reg_label[i*all_anchors_num:(i+1)*all_anchors_num, :] = reg_label
self.batch_cls_mask[i*all_anchors_num:(i+1)*all_anchors_num] = cls_mask
self.batch_reg_mask[i*all_anchors_num:(i+1)*all_anchors_num] = reg_mask
return self.batch_image, self.batch_cls_label, self.batch_reg_label, self.batch_cls_mask, self.batch_reg_mask