yolo1 源码,下载,以及讲解

https://zhuanlan.zhihu.com/p/25053311

YOLO是基于深度学习的端到端的实时目标检测系统。与大部分目标检测与识别方法(比如Fast R-CNN)将目标识别任务分类目标区域预测和类别预测等多个流程不同,YOLO将目标区域预测和目标类别预测整合于单个神经网络模型中,实现在准确率较高的情况下快速目标检测与识别,更加适合现场应用环境。详情请参见:YOLO:实时快速目标检测YOLO升级版:YOLOv2和YOLO9000解析。本文将对YOLO的tensorflow实现代码进行详解。本文使用的YOLO源码来源于hizhangp/yolo_tensorflow

本文结构如下:一,YOLO代码概况;二,train解析;三,test概括;四,总结

1 YOLO代码概况

源代码文件构成如图1-1所示。train.py为训练代码,test.py为测试代码,其它文件夹内的代码为设定参数,建立网络,读取数据等辅助代码。

图1-1 YOLO源代码文件夹

2 train解析

从main()方法,首先读取参数;其次建立YOLONet;然后读取训练数据;最后进行训练。

2.1 建立YOLONet

YOLONet的建立是通过 yolo文件夹中的yolo_net.py文件的代码实现了。yolo_net.py定义了YOLONet类,该类包含了网络初始化(__init__()),建立网络(build_networks())和loss函数(loss_layer())等方法。

网络的所有初始化参数包含于__init__()方法之中。

扫描二维码关注公众号,回复: 2384093 查看本文章

def __init__(self, phase):

        self.weights_file = cfg.WEIGHTS_FILE#权重文件

        self.classes = cfg.CLASSES#类别

        self.num_class = len(self.classes)#类别数量,值为20

        self.image_size = cfg.IMAGE_SIZE#图像尺寸,值为448

        self.cell_size = cfg.CELL_SIZE#cell尺寸,值为7

        self.boxes_per_cell = cfg.BOXES_PER_CELL#每个grid cell负责的boxes,默认为2

        self.output_size = (self.cell_size * self.cell_size) * \

            (self.num_class + self.boxes_per_cell * 5)#输出尺寸

        self.scale = 1.0 * self.image_size / self.cell_size

        self.boundary1 = self.cell_size * self.cell_size * self.num_class#7×7×20

        self.boundary2 = self.boundary1 + self.cell_size * \

            self.cell_size * self.boxes_per_cell#7×7×20+7×7×2

        self.object_scale = cfg.OBJECT_SCALE#值为1

        self.noobject_scale = cfg.NOOBJECT_SCALE#值为1

        self.class_scale = cfg.CLASS_SCALE#值为2.0

        self.coord_scale = cfg.COORD_SCALE#值为5.0

        self.learning_rate = cfg.LEARNING_RATE#学习速率LEARNING_RATE = 0.0001

        self.batch_size = cfg.BATCH_SIZE#BATCH_SIZE = 45

        self.alpha = cfg.ALPHA#ALPHA = 0.1

        self.disp_console = cfg.DISP_CONSOLE#DISP_CONSOLE = False

        self.phase = phase#train or test

        self.collection = []#用于储存网络参数

        self.offset = np.transpose(np.reshape(np.array(

            [np.arange(self.cell_size)] * self.cell_size * self.boxes_per_cell),

            (self.boxes_per_cell, self.cell_size, self.cell_size)), (1, 2, 0))#偏置

        self.build_networks()

网络建立是通过build_networks()方法实现的,网络由卷积层-pooling层和全连接层组成,详细结构请参见源代码和YOLO:实时快速目标检测。网络接受输入维度为([None, 448, 448, 3]),输出维度为([None,1470])。

loss函数代码的关键,loss函数定义为:

(参加:YOLO:实时快速目标检测

loss函数是通过loss_layer()实现,代码注释对各个变量的shape进行了注释,结果如下。

计算iou

    def calc_iou(self, boxes1, boxes2):

        “”“calculate ious

        Args:

          boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4]  ====> (x_center, y_center, w, h)

          boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)

        Return:

          iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

        “”“

        boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,

                          boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,

                          boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,

                          boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])

        boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])

        boxes2 = tf.pack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,

                          boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,

                          boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,

                          boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2])

        boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])

        # calculate the left up point & right down point

        lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])

        rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])

        # intersection

        intersection = tf.maximum(0.0, rd - lu)

        inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]

        # calculate the boxs1 square and boxs2 square

        square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \

            (boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])

        square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \

            (boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])

        union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)

        return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)

    #loss函数

    #idx=33,predicts为fc_32,labels shape为(45, 7, 7, 25)

    #self.loss = self.loss_layer(33, self.fc_32, self.labels)

    def loss_layer(self, idx, predicts, labels):

        #将网络输出分离为类别和定位以及box大小,输出维度为7*7*20+7*7*2+7*7*2*4=1470

        #类别,shape为(45, 7, 7, 20)

        predict_classes = tf.reshape(predicts[:, :self.boundary1],

            [self.batch_size, self.cell_size, self.cell_size, self.num_class])

        #定位,shape为(45, 7, 7, 2)

        predict_scales = tf.reshape(predicts[:, self.boundary1:self.boundary2],

            [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])

        ##box大小,长宽等 shape为(45, 7, 7, 2, 4)

        predict_boxes = tf.reshape(predicts[:, self.boundary2:],

            [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])

        #label的类别结果,shape为(45, 7, 7, 1)

        response = tf.reshape(labels[:, :, :, 0],

            [self.batch_size, self.cell_size, self.cell_size, 1])

        #label的定位结果,shape为(45, 7, 7, 1, 4)

        boxes = tf.reshape(labels[:, :, :, 1:5],

            [self.batch_size, self.cell_size, self.cell_size, 1, 4])

        #label的大小结果,shapewei (45, 7, 7, 2, 4)

        boxes = tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size


        #shape 为(45, 7, 7, 20)

        classes = labels[:, :, :, 5:]

        #offset shape为(7, 7, 2)

        offset = tf.constant(self.offset, dtype=tf.float32)

        #shape为 (1,7, 7, 2)

        offset = tf.reshape(offset,

            [1, self.cell_size, self.cell_size, self.boxes_per_cell])

        #shape为(45, 7, 7, 2)

        offset = tf.tile(offset, [self.batch_size, 1, 1, 1])

        #shape为(4, 45, 7, 7, 2)

        predict_boxes_tran = tf.pack([(predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,

                                      (predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,

                                      tf.square(predict_boxes[:, :, :, :, 2]),

                                      tf.square(predict_boxes[:, :, :, :, 3])])

        #shape为(45, 7, 7, 2, 4)

        predict_boxes_tran = tf.transpose(predict_boxes_tran, [1, 2, 3, 4, 0])

        #shape为(45, 7, 7, 2)

        iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)

        # calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

        #shape为 (45, 7, 7, 1)

        object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)

        #shape为(45, 7, 7, 2)

        object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response

        # mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])

        # calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

        #shape为(45, 7, 7, 2)

        noobject_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask


        #shape为(4, 45, 7, 7, 2)

        boxes_tran = tf.pack([boxes[:, :, :, :, 0] * self.cell_size - offset,

                              boxes[:, :, :, :, 1] * self.cell_size - tf.transpose(offset, (0, 2, 1, 3)),

                              tf.sqrt(boxes[:, :, :, :, 2]),

                              tf.sqrt(boxes[:, :, :, :, 3])])

        (45, 7, 7, 2, 4)

        boxes_tran = tf.transpose(boxes_tran, [1, 2, 3, 4, 0])

        # class_loss

        class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(response * (predict_classes - classes)),

            reduction_indices=[1, 2, 3]), name=’class_loss’) * self.class_scale

        # object_loss

        object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_mask * (predict_scales - iou_predict_truth)),

            reduction_indices=[1, 2, 3]), name=’object_loss’) * self.object_scale

        # noobject_loss

        noobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_mask * predict_scales),

            reduction_indices=[1, 2, 3]), name=’noobject_loss’) * self.noobject_scale

        # coord_loss

        #shape 为 (45, 7, 7, 2, 1)

        coord_mask = tf.expand_dims(object_mask, 4)


        #shape为(45, 7, 7, 2, 4)

        boxes_delta = coord_mask * (predict_boxes - boxes_tran)

        coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta),

            reduction_indices=[1, 2, 3, 4]), name=’coord_loss’) * self.coord_scale

        tf.summary.scalar(self.phase + ‘/class_loss’, class_loss)

        tf.summary.scalar(self.phase + ‘/object_loss’, object_loss)

        tf.summary.scalar(self.phase + ‘/noobject_loss’, noobject_loss)

        tf.summary.scalar(self.phase + ‘/coord_loss’, coord_loss)

        tf.summary.histogram(self.phase + ‘/boxes_delta_x’, boxes_delta[:, :, :, :, 0])

        tf.summary.histogram(self.phase + ‘/boxes_delta_y’, boxes_delta[:, :, :, :, 1])

        tf.summary.histogram(self.phase + ‘/boxes_delta_w’, boxes_delta[:, :, :, :, 2])

        tf.summary.histogram(self.phase + ‘/boxes_delta_h’, boxes_delta[:, :, :, :, 3])

        tf.summary.histogram(self.phase + ‘/iou’, iou_predict_truth)

        return class_loss + object_loss + noobject_loss + coord_loss

2.2 读取数据

通过utils文件夹中的pascal_voc.py文件读取数据。

2.3 训练

模型训练包含于train()方法之中。训练部分只需看懂了初始化参数,整个结构就很清晰了。值得注意的地方是在训练过程中,对变量采用了数平均数(exponential moving average (EMA))来提高训练性能,详情见代码注释。同时,运行train.py时,建议将batch_size改小一些(原参数batch size为45,第一次运行没注意,死机了)。

计算iou

    def calc_iou(self, boxes1, boxes2):

        “”“calculate ious

        Args:

          boxes1: 4-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4]  ====> (x_center, y_center, w, h)

          boxes2: 1-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL, 4] ===> (x_center, y_center, w, h)

        Return:

          iou: 3-D tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

        “”“

        boxes1 = tf.pack([boxes1[:, :, :, :, 0] - boxes1[:, :, :, :, 2] / 2.0,

                          boxes1[:, :, :, :, 1] - boxes1[:, :, :, :, 3] / 2.0,

                          boxes1[:, :, :, :, 0] + boxes1[:, :, :, :, 2] / 2.0,

                          boxes1[:, :, :, :, 1] + boxes1[:, :, :, :, 3] / 2.0])

        boxes1 = tf.transpose(boxes1, [1, 2, 3, 4, 0])

        boxes2 = tf.pack([boxes2[:, :, :, :, 0] - boxes2[:, :, :, :, 2] / 2.0,

                          boxes2[:, :, :, :, 1] - boxes2[:, :, :, :, 3] / 2.0,

                          boxes2[:, :, :, :, 0] + boxes2[:, :, :, :, 2] / 2.0,

                          boxes2[:, :, :, :, 1] + boxes2[:, :, :, :, 3] / 2])

        boxes2 = tf.transpose(boxes2, [1, 2, 3, 4, 0])

        # calculate the left up point & right down point

        lu = tf.maximum(boxes1[:, :, :, :, :2], boxes2[:, :, :, :, :2])

        rd = tf.minimum(boxes1[:, :, :, :, 2:], boxes2[:, :, :, :, 2:])

        # intersection

        intersection = tf.maximum(0.0, rd - lu)

        inter_square = intersection[:, :, :, :, 0] * intersection[:, :, :, :, 1]

        # calculate the boxs1 square and boxs2 square

        square1 = (boxes1[:, :, :, :, 2] - boxes1[:, :, :, :, 0]) * \

            (boxes1[:, :, :, :, 3] - boxes1[:, :, :, :, 1])

        square2 = (boxes2[:, :, :, :, 2] - boxes2[:, :, :, :, 0]) * \

            (boxes2[:, :, :, :, 3] - boxes2[:, :, :, :, 1])

        union_square = tf.maximum(square1 + square2 - inter_square, 1e-10)

        return tf.clip_by_value(inter_square / union_square, 0.0, 1.0)

    #loss函数

    #idx=33,predicts为fc_32,labels shape为(45, 7, 7, 25)

    #self.loss = self.loss_layer(33, self.fc_32, self.labels)

    def loss_layer(self, idx, predicts, labels):

        #将网络输出分离为类别和定位以及box大小,输出维度为7*7*20+7*7*2+7*7*2*4=1470

        #类别,shape为(45, 7, 7, 20)

        predict_classes = tf.reshape(predicts[:, :self.boundary1],

            [self.batch_size, self.cell_size, self.cell_size, self.num_class])

        #定位,shape为(45, 7, 7, 2)

        predict_scales = tf.reshape(predicts[:, self.boundary1:self.boundary2],

            [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell])

        ##box大小,长宽等 shape为(45, 7, 7, 2, 4)

        predict_boxes = tf.reshape(predicts[:, self.boundary2:],

            [self.batch_size, self.cell_size, self.cell_size, self.boxes_per_cell, 4])

        #label的类别结果,shape为(45, 7, 7, 1)

        response = tf.reshape(labels[:, :, :, 0],

            [self.batch_size, self.cell_size, self.cell_size, 1])

        #label的定位结果,shape为(45, 7, 7, 1, 4)

        boxes = tf.reshape(labels[:, :, :, 1:5],

            [self.batch_size, self.cell_size, self.cell_size, 1, 4])

        #label的大小结果,shapewei (45, 7, 7, 2, 4)

        boxes = tf.tile(boxes, [1, 1, 1, self.boxes_per_cell, 1]) / self.image_size


        #shape 为(45, 7, 7, 20)

        classes = labels[:, :, :, 5:]

        #offset shape为(7, 7, 2)

        offset = tf.constant(self.offset, dtype=tf.float32)

        #shape为 (1,7, 7, 2)

        offset = tf.reshape(offset,

            [1, self.cell_size, self.cell_size, self.boxes_per_cell])

        #shape为(45, 7, 7, 2)

        offset = tf.tile(offset, [self.batch_size, 1, 1, 1])

        #shape为(4, 45, 7, 7, 2)

        predict_boxes_tran = tf.pack([(predict_boxes[:, :, :, :, 0] + offset) / self.cell_size,

                                      (predict_boxes[:, :, :, :, 1] + tf.transpose(offset, (0, 2, 1, 3))) / self.cell_size,

                                      tf.square(predict_boxes[:, :, :, :, 2]),

                                      tf.square(predict_boxes[:, :, :, :, 3])])

        #shape为(45, 7, 7, 2, 4)

        predict_boxes_tran = tf.transpose(predict_boxes_tran, [1, 2, 3, 4, 0])

        #shape为(45, 7, 7, 2)

        iou_predict_truth = self.calc_iou(predict_boxes_tran, boxes)

        # calculate I tensor [BATCH_SIZE, CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

        #shape为 (45, 7, 7, 1)

        object_mask = tf.reduce_max(iou_predict_truth, 3, keep_dims=True)

        #shape为(45, 7, 7, 2)

        object_mask = tf.cast((iou_predict_truth >= object_mask), tf.float32) * response

        # mask = tf.tile(response, [1, 1, 1, self.boxes_per_cell])

        # calculate no_I tensor [CELL_SIZE, CELL_SIZE, BOXES_PER_CELL]

        #shape为(45, 7, 7, 2)

        noobject_mask = tf.ones_like(object_mask, dtype=tf.float32) - object_mask


        #shape为(4, 45, 7, 7, 2)

        boxes_tran = tf.pack([boxes[:, :, :, :, 0] * self.cell_size - offset,

                              boxes[:, :, :, :, 1] * self.cell_size - tf.transpose(offset, (0, 2, 1, 3)),

                              tf.sqrt(boxes[:, :, :, :, 2]),

                              tf.sqrt(boxes[:, :, :, :, 3])])

        (45, 7, 7, 2, 4)

        boxes_tran = tf.transpose(boxes_tran, [1, 2, 3, 4, 0])

        # class_loss

        class_loss = tf.reduce_mean(tf.reduce_sum(tf.square(response * (predict_classes - classes)),

            reduction_indices=[1, 2, 3]), name=’class_loss’) * self.class_scale

        # object_loss

        object_loss = tf.reduce_mean(tf.reduce_sum(tf.square(object_mask * (predict_scales - iou_predict_truth)),

            reduction_indices=[1, 2, 3]), name=’object_loss’) * self.object_scale

        # noobject_loss

        noobject_loss = tf.reduce_mean(tf.reduce_sum(tf.square(noobject_mask * predict_scales),

            reduction_indices=[1, 2, 3]), name=’noobject_loss’) * self.noobject_scale

        # coord_loss

        #shape 为 (45, 7, 7, 2, 1)

        coord_mask = tf.expand_dims(object_mask, 4)


        #shape为(45, 7, 7, 2, 4)

        boxes_delta = coord_mask * (predict_boxes - boxes_tran)

        coord_loss = tf.reduce_mean(tf.reduce_sum(tf.square(boxes_delta),

            reduction_indices=[1, 2, 3, 4]), name=’coord_loss’) * self.coord_scale

        tf.summary.scalar(self.phase + ‘/class_loss’, class_loss)

        tf.summary.scalar(self.phase + ‘/object_loss’, object_loss)

        tf.summary.scalar(self.phase + ‘/noobject_loss’, noobject_loss)

        tf.summary.scalar(self.phase + ‘/coord_loss’, coord_loss)

        tf.summary.histogram(self.phase + ‘/boxes_delta_x’, boxes_delta[:, :, :, :, 0])

        tf.summary.histogram(self.phase + ‘/boxes_delta_y’, boxes_delta[:, :, :, :, 1])

        tf.summary.histogram(self.phase + ‘/boxes_delta_w’, boxes_delta[:, :, :, :, 2])

        tf.summary.histogram(self.phase + ‘/boxes_delta_h’, boxes_delta[:, :, :, :, 3])

        tf.summary.histogram(self.phase + ‘/iou’, iou_predict_truth)

        return class_loss + object_loss + noobject_loss + coord_loss

3 test概括

test.py完成读取训练好的网络权重,检测目标,并画出目标所在位置。代码和训练部分类似,略过。要运行test,首先需要下载文章原作者训练好的模型YOLO_small(貌似需要翻墙)。其次,源代码中有一处小bug,直接运行会报错。

net_output=self.sess.run(self.net.fc_32,feed_dict={self.net.images:inputs})

需要改为

net_output=self.sess.run(self.net.fc_32,feed_dict={self.net.x:inputs})

运行结果如图3-1所示,可以看出YOLO能成功识别人和狗,却识别不了马,作者在后续的文章中对YOLO进行了,使之能识别更多的种类,详见YOLO升级版:YOLOv2和YOLO9000解析

图3-1 YOLO检测结果一

或者其它图片,如图3-2所示:

图3-2 YOLO检测结果二

4 总结

YOLO是基于深度学习的端到端的实时目标检测系统,主要的特点是速度非常快,同时还有继续提升精确度的潜力。本文对YOLO的tensorflow实现代码进行了详解,该代码在理解了文章后就很简单。其中涉及到的tensorflow知识有以下几点:

一,tf.get_variable 和tf.Variable的差异。差异点点是,前者拥有一个变量检查机制,会检测已经存在的变量是否设置为共享变量,如果已经存在的变量没有设置为共享变量,TensorFlow 运行到第二个拥有相同名字的变量的时候,就会报错。

二,学习速率延迟的实现。

tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None) #decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)

三, 采用指数平均数(exponential moving average (EMA))提高梯度下降(exponential moving average (EMA))训练方法的效果。

self.ema = tf.train.ExponentialMovingAverage(decay=0.9999)

        self.averages_op = self.ema.apply(tf.trainable_variables())

with tf.control_dependencies([self.optimizer]):

            self.train_op = tf.group(self.averages_op)

四,tf.pack()函数。

tf.pack(values, name=textquotesingle {}packtextquotesingle {})

#该函数的功能等同于np.asarray

tf.pack([x, y, z]) = np.asarray([x, y, z])

五,tf.tile()函数。该函数在某一维度上进行复制。

tf.tile(input,multiples,name=None)



猜你喜欢

转载自blog.csdn.net/m0_37192554/article/details/81094686