最近做项目需要用到yolo,参考了网上一些代码和文章,记录一下对yolo的理解。
yolo核心思想:
将整张图作为网络的输入,单个神经网络将图像分割成区域。并且,这个神经网络可以预测每个区域的边界框,以及用该边界框分割区域的概率。这些边界框由预测的概率加权。
优势:
- 它在测试时查看整个图像,因此它的预测由图像中的全局信息提供。
- 它可以通过单一网络进行预测,而不像R-CNN这样的系统需要数千张图像,这使得它运行速度非常快。
yolo代码理解:
在对yolo有了初步的了解之后,我在github上找到了一份yolov3的代码,通过代码理解yolo的执行过程。(代码来自GitHub:https://github.com/xiaochus/YOLOv3)
- 首先,argparse这个模块包的意义需要了解一下,详情见注释。
import argparse #Argparse的作用就是为py文件封装好可以选择的参数,使他们更加灵活,丰富。即导入一个模块
parser = argparse.ArgumentParser( description='Yet Another Darknet To Keras Converter.') #parser = argparse.ArgumentParser()的目的是创建一个解析对象
parser.add_argument('config_path', help='Path to Darknet cfg file.') #然向该对象中添加你要关注的命令行参数和选项,每一个add_argument方法对应一个你要关注的参数或选项
_main(parser.parse_args()) #最后调用parse_args()方法进行解析,解析成功之后即可使用
- yad2k.py部分的代码感觉不需要过分解读,这部分代码的主要目的就是使用yolo的网络结构配置文件使其转换成keras的.h5文件。如下所示,一些关键的地方写了注释。
#! /usr/bin/env python
"""
Reads Darknet53 config and weights and creates Keras model with TF backend.
Currently only supports layers in Darknet53 config.
"""
import argparse #Argparse的作用就是为py文件封装好可以选择的参数,使他们更加灵活,丰富。即导入一个模块
import configparser #ConfigParser模块在python中用来读取配置文件,配置文件的格式可以包含一个或多个节(section), 每个节可以有多个参数(键=值)
import io
import os
from collections import defaultdict
import numpy as np
from keras import backend as K
from keras.layers import (Conv2D, GlobalAveragePooling2D, Input, Reshape,
ZeroPadding2D, UpSampling2D, Activation, Lambda, MaxPooling2D)
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.merge import concatenate, add
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.regularizers import l2
from keras.utils.vis_utils import plot_model as plot
parser = argparse.ArgumentParser( description='Yet Another Darknet To Keras Converter.') #parser = argparse.ArgumentParser()的目的是创建一个解析对象
parser.add_argument('config_path', help='Path to Darknet cfg file.') #然向该对象中添加你要关注的命令行参数和选项,每一个add_argument方法对应一个你要关注的参数或选项
parser.add_argument('weights_path', help='Path to Darknet weights file.')
parser.add_argument('output_path', help='Path to output Keras model file.')
parser.add_argument(
'-p',
'--plot_model',
help='Plot generated Keras model and save as image.',
action='store_true')
parser.add_argument(
'-flcl',
'--fully_convolutional',
help='Model is fully convolutional so set input shape to (None, None, 3). '
'WARNING: This experimental option does not work properly for YOLO_v2.',
action='store_true')
def unique_config_sections(config_file): #转换所有config部分,使其具有唯一名称,并为config解析器的兼容性向配置部分添加惟一后缀。
"""Convert all config sections to have unique names.
Adds unique suffixes to config sections for compability with configparser.
"""
section_counters = defaultdict(int) #defaultdict的作用是当字典里的key不存在但被查找时,返回的不是keyError而是一个默认值,这个默认值是多少根据defaultdict()中的值决定,如defaultdict(int)就表示这个默认值是0
output_stream = io.StringIO() #io.StringIO表示在内存中以io流的方式读写str
with open(config_file) as fin:
for line in fin:
if line.startswith('['):
section = line.strip().strip('[]')
_section = section + '_' + str(section_counters[section])
section_counters[section] += 1
line = line.replace(section, _section)
output_stream.write(line)
output_stream.seek(0) #把文件指针移动到文件开始处
return output_stream
def _main(args):
config_path = os.path.expanduser(args.config_path) #返回参数,参数中开头的~被替换成user的主目录;如果扩展失败或者参数path不是以~打头,则直接返回参数(path)。
weights_path = os.path.expanduser(args.weights_path)
output_path = os.path.expanduser(args.output_path)
assert config_path.endswith('.cfg'), '{} is not a .cfg file'.format(
config_path) #assert断言是声明其布尔值必须为真的判定,如果发生异常就说明表达示为假。
assert weights_path.endswith(
'.weights'), '{} is not a .weights file'.format(weights_path)
assert output_path.endswith(
'.h5'), 'output path {} is not a .h5 file'.format(output_path)
output_root = os.path.splitext(output_path)[0]
# Load weights and config.
print('Loading weights.')
weights_file = open(weights_path, 'rb')
weights_header = np.ndarray(
shape=(5, ), dtype='int32', buffer=weights_file.read(20)) #ndarray 是一个多维的数组对象,它有一个特点是同构,即其中所有元素的类型必须相同
print('Weights Header: ', weights_header)
# TODO: Check transpose flag when implementing fully connected layers.
# transpose = (weight_header[0] > 1000) or (weight_header[1] > 1000)
print('Parsing Darknet config.')
unique_config_file = unique_config_sections(config_path)
cfg_parser = configparser.ConfigParser()
cfg_parser.read_file(unique_config_file)
print('Creating Keras model.')
if args.fully_convolutional: #如果模型是全卷积的
image_height, image_width = None, None
else:
image_height = int(cfg_parser['net_0']['height'])
image_width = int(cfg_parser['net_0']['width'])
prev_layer = Input(shape=(image_height, image_width, 3))
all_layers = [prev_layer]
outputs = []
weight_decay = float(cfg_parser['net_0']['decay']
) if 'net_0' in cfg_parser.sections() else 5e-4
count = 0
for section in cfg_parser.sections():
print('Parsing section {}'.format(section))
if section.startswith('convolutional'):
filters = int(cfg_parser[section]['filters'])
size = int(cfg_parser[section]['size'])
stride = int(cfg_parser[section]['stride'])
pad = int(cfg_parser[section]['pad'])
activation = cfg_parser[section]['activation']
batch_normalize = 'batch_normalize' in cfg_parser[section]
# Setting weights.
# Darknet serializes convolutional weights as:
# [bias/beta, [gamma, mean, variance], conv_weights]
prev_layer_shape = K.int_shape(prev_layer)
# TODO: This assumes channel last dim_ordering.
weights_shape = (size, size, prev_layer_shape[-1], filters)
darknet_w_shape = (filters, weights_shape[2], size, size)
weights_size = np.product(weights_shape)
print('conv2d', 'bn'
if batch_normalize else ' ', activation, weights_shape)
conv_bias = np.ndarray(
shape=(filters, ),
dtype='float32',
buffer=weights_file.read(filters * 4))
count += filters
if batch_normalize:
bn_weights = np.ndarray(
shape=(3, filters),
dtype='float32',
buffer=weights_file.read(filters * 12))
count += 3 * filters
# TODO: Keras BatchNormalization mistakenly refers to var
# as std.
bn_weight_list = [
bn_weights[0], # scale gamma
conv_bias, # shift beta
bn_weights[1], # running mean
bn_weights[2] # running var
]
conv_weights = np.ndarray(
shape=darknet_w_shape,
dtype='float32',
buffer=weights_file.read(weights_size * 4))
count += weights_size
# DarkNet conv_weights are serialized Caffe-style:
# (out_dim, in_dim, height, width)
# We would like to set these to Tensorflow order:
# (height, width, in_dim, out_dim)
# TODO: Add check for Theano dim ordering.
conv_weights = np.transpose(conv_weights, [2, 3, 1, 0])
conv_weights = [conv_weights] if batch_normalize else [
conv_weights, conv_bias
]
# Handle activation.
act_fn = None
if activation == 'leaky':
pass # Add advanced activation later.
elif activation != 'linear':
raise ValueError(
'Unknown activation function `{}` in section {}'.format(
activation, section))
padding = 'same' if pad == 1 and stride == 1 else 'valid'
# Adjust padding model for darknet.
if stride == 2:
prev_layer = ZeroPadding2D(((1, 0), (1, 0)))(prev_layer)
# Create Conv2D layer
conv_layer = (Conv2D(
filters, (size, size),
strides=(stride, stride),
kernel_regularizer=l2(weight_decay),
use_bias=not batch_normalize,
weights=conv_weights,
activation=act_fn,
padding=padding))(prev_layer)
if batch_normalize:
conv_layer = (BatchNormalization(
weights=bn_weight_list))(conv_layer)
prev_layer = conv_layer
if activation == 'linear':
all_layers.append(prev_layer)
elif activation == 'leaky':
act_layer = LeakyReLU(alpha=0.1)(prev_layer)
prev_layer = act_layer
all_layers.append(prev_layer)
elif section.startswith('maxpool'):
size = int(cfg_parser[section]['size'])
stride = int(cfg_parser[section]['stride'])
all_layers.append(
MaxPooling2D(
padding='same',
pool_size=(size, size),
strides=(stride, stride))(prev_layer))
prev_layer = all_layers[-1]
elif section.startswith('avgpool'):
if cfg_parser.items(section) != []:
raise ValueError('{} with params unsupported.'.format(section))
all_layers.append(GlobalAveragePooling2D()(prev_layer))
prev_layer = all_layers[-1]
elif section.startswith('route'):
ids = [int(i) for i in cfg_parser[section]['layers'].split(',')]
if len(ids) == 2:
for i, item in enumerate(ids):
if item != -1:
ids[i] = item + 1
layers = [all_layers[i] for i in ids]
if len(layers) > 1:
print('Concatenating route layers:', layers)
concatenate_layer = concatenate(layers)
all_layers.append(concatenate_layer)
prev_layer = concatenate_layer
else:
skip_layer = layers[0] # only one layer to route
all_layers.append(skip_layer)
prev_layer = skip_layer
elif section.startswith('shortcut'):
ids = [int(i) for i in cfg_parser[section]['from'].split(',')][0]
activation = cfg_parser[section]['activation']
shortcut = add([all_layers[ids], prev_layer])
if activation == 'linear':
shortcut = Activation('linear')(shortcut)
all_layers.append(shortcut)
prev_layer = all_layers[-1]
elif section.startswith('upsample'):
stride = int(cfg_parser[section]['stride'])
all_layers.append(
UpSampling2D(
size=(stride, stride))(prev_layer))
prev_layer = all_layers[-1]
elif section.startswith('yolo'):
classes = int(cfg_parser[section]['classes'])
# num = int(cfg_parser[section]['num'])
# mask = int(cfg_parser[section]['mask'])
n1, n2 = int(prev_layer.shape[1]), int(prev_layer.shape[2])
n3 = 3
n4 = (4 + 1 + classes)
yolo = Reshape((n1, n2, n3, n4))(prev_layer)
all_layers.append(yolo)
prev_layer = all_layers[-1]
outputs.append(len(all_layers) - 1)
elif (section.startswith('net')):
pass # Configs not currently handled during model definition.
else:
raise ValueError(
'Unsupported section header type: {}'.format(section))
# Create and save model.
model = Model(inputs=all_layers[0],
outputs=[all_layers[i] for i in outputs])
print(model.summary())
model.save('{}'.format(output_path))
print('Saved Keras model to {}'.format(output_path))
# Check to see if all weights have been read.
remaining_weights = len(weights_file.read()) / 4
weights_file.close()
print('Read {} of {} from Darknet weights.'.format(count, count +
remaining_weights))
if remaining_weights > 0:
print('Warning: {} unused weights'.format(remaining_weights))
plot(model, to_file='{}.png'.format(output_root), show_shapes=True)
print('Saved model plot to {}.png'.format(output_root))
if __name__ == '__main__':
_main(parser.parse_args()) #最后调用parse_args()方法进行解析,解析成功之后即可使用
- 看yolo_model.py部分的代码看的比较痛苦,需要注意的是下面这个predict函数。
def predict(self, image, shape):
"""Detect the objects with yolo.
# Arguments
image: ndarray, processed input image.
shape: shape of original image.
# Returns
boxes: ndarray, boxes of objects.
classes: ndarray, classes of objects.
scores: ndarray, scores of objects.
"""
outs = self._yolo.predict(image) #这里的predict其实用到的是Keras中model.py对predict的定义。它的作用是为输入样本生成输出预测,输入样本批次处理。
#print("11111111111",outs)
boxes, classes, scores = self._yolo_out(outs, shape)
return boxes, classes, scores
- 最主要的是demo.py部分的代码,对于不同的项目,也基本这部分代码需要改动,详情见注释,写的很详细。
"""Demo for use yolo v3
"""
import os
import time
import cv2
import numpy as np
from model.yolo_model import YOLO
def process_image(img):
"""Resize, reduce and expand image.
# Argument:
img: original image.
# Returns
image: ndarray(64, 64, 3), processed image.
"""
image = cv2.resize(img, (416, 416),
interpolation=cv2.INTER_CUBIC) #图像缩放函数,将图像缩放至指定大小,image是416x416x3矩阵,其中416和416是指定的图像的宽和高的大小,3指图像的通道为3,这表示处理的是彩色图像
image = np.array(image, dtype='float32') #将图像转化为指定类型的数组
image /= 255. #将图像矩阵转化至0~1之间
image = np.expand_dims(image, axis=0) #np.expand_dims:用于扩展数组的形状,这里axis=0使image矩阵变成1x416x416x3矩阵了
return image
def get_classes(file):
"""Get classes name.
# Argument:
file: classes name for database.
# Returns
class_names: List, classes name.
"""
with open(file) as f:
class_names = f.readlines()
class_names = [c.strip() for c in class_names]
return class_names
def draw(image, boxes, scores, classes, all_classes):
"""Draw the boxes on the image.
# Argument:
image: original image.
boxes: ndarray, boxes of objects.
classes: ndarray, classes of objects.
scores: ndarray, scores of objects.
all_classes: all classes name.
"""
for box, score, cl in zip(boxes, scores, classes):
x, y, w, h = box #x,y用于平行;w,h用于缩放
'''
下面上下左右坐标是怎么得到的其实没有特别理解
'''
top = max(0, np.floor(x + 0.5).astype(int))
left = max(0, np.floor(y + 0.5).astype(int))
right = min(image.shape[1], np.floor(x + w + 0.5).astype(int))
bottom = min(image.shape[0], np.floor(y + h + 0.5).astype(int))
cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2) #通过对角线画矩形
cv2.putText(image, '{0} {1:.2f}'.format(all_classes[cl], score),
(top, left - 6),
cv2.FONT_HERSHEY_SIMPLEX,
0.6, (0, 0, 255), 1,
cv2.LINE_AA) #为图片添加文字,其中(top, left - 6)指的是左上角坐标
print('class: {0}, score: {1:.2f}'.format(all_classes[cl], score))
print('box coordinate x,y,w,h: {0}'.format(box))
print()
def detect_image(image, yolo, all_classes):
"""Use yolo v3 to detect images.
# Argument:
image: original image.
yolo: YOLO, yolo model.
all_classes: all classes name.
# Returns:
image: processed image.
"""
pimage = process_image(image)
start = time.time() #测量时钟时间
boxes, classes, scores = yolo.predict(pimage, image.shape)
'''
检测的核心输出,其中
boxes:框的四个点坐标,(top, left, bottom, right);
scores:框的类别置信度,融合框置信度和类别置信度;
classes:框的类别;
'''
end = time.time()
print('time: {0:.2f}s'.format(end - start)) #输出所用时间
if boxes is not None:
draw(image, boxes, scores, classes, all_classes)
return image
def detect_video(video, yolo, all_classes):
"""Use yolo v3 to detect video.
# Argument:
video: video file.
yolo: YOLO, yolo model.
all_classes: all classes name.
"""
video_path = os.path.join("videos", "test", video)
camera = cv2.VideoCapture(video_path)
cv2.namedWindow("detection", cv2.WINDOW_AUTOSIZE)
# Prepare for saving the detected video
sz = (int(camera.get(cv2.CAP_PROP_FRAME_WIDTH)),
int(camera.get(cv2.CAP_PROP_FRAME_HEIGHT)))
fourcc = cv2.VideoWriter_fourcc(*'mpeg') #cv2.VideoWriter_fourcc()函数的作用是输入四个字符代码即可得到对应的视频编码器。
vout = cv2.VideoWriter()
vout.open(os.path.join("videos", "res", video), fourcc, 20, sz, True)
while True: #以下代码的作用是在视频中截取图片
res, frame = camera.read()
if not res:
break
image = detect_image(frame, yolo, all_classes)
cv2.imshow("detection", image)
# Save the video frame by frame
vout.write(image)
if cv2.waitKey(110) & 0xff == 27:
break
vout.release()
camera.release()
if __name__ == '__main__':
yolo = YOLO(0.6, 0.5) #0.6和0.5分别表示threshold for object以及threshold for box;就是定义的两个参数
file = 'data/coco_classes.txt'
all_classes = get_classes(file)
# detect images in test floder.
for (root, dirs, files) in os.walk('images/test'): #os.walk返回的(root,dirs,files)分别表示:root 所指的是当前正在遍历的这个文件夹的本身的地址;dirs 是一个 list ,内容是该文件夹中所有的目录的名字(不包括子目录);files 同样是 list , 内容是该文件夹中所有的文件(不包括子目录)
if files:
for f in files:
print(f)
path = os.path.join(root, f)
image = cv2.imread(path)
image = detect_image(image, yolo, all_classes)
cv2.imwrite('images/res/' + f, image)
# detect videos one at a time in videos/test folder
video = 'library1.mp4'
detect_video(video, yolo, all_classes)
另外,关于视频部分的yolo处理方法,还需要进一步理解!
yolo代码执行过程中出现配置环境问题:
主要是之前执行代码时,一直出现“ImportError: Failed to import pydot. You must install pydot and graphviz for pydotprint
to work.”这样的错误。对于windows系统+Python3.6+anaconda3的环境,解决方法大致如下:
- pip install graphviz,先下载graphviz
- pip install pydot_ng==2.0.0,下载pydot_ng,我的是2.0.0版本,亲测可用。(如果是Python 2,需要下载的是pydot)
- 在这个网址(http://www.graphviz.org/download/ )下载相应graphviz的安装包,即使用pip下载了,也一定要在这个网址再下载一遍!!!经过实验,还是下载msi版本吧,zip版本并没什么用。然后执行msi,我是创建了一个新的文件夹“graphviz-2.38”,然后把执行msi后的文件安装到了这个文件夹中。基本目录如下图所示。
- 把bin文件夹环境添加到系统的环境变量中。我的是“F:\ana\Lib\site-packages\graphviz-2.38\bin”。
- 进行最后测试。
如果是Python 3,执行以下代码:
import pydot_ng as pydot1
print(pydot1.find_graphviz())
如果是Python 2,执行以下代码:
import pydot
print pydot.find_graphviz()
- 如果最终显示“{‘dot’: ‘F:\ana\Lib\site-packages\graphviz-2.38\bin\dot.exe’, ‘twopi’: ‘F:\ana\Lib\site-packages\graphviz-2.38\bin\twopi.exe’, ‘neato’: ‘F:\ana\Lib\site-packages\graphviz-2.38\bin\neato.exe’, ‘circo’: ‘F:\ana\Lib\site-packages\graphviz-2.38\bin\circo.exe’, ‘fdp’: ‘F:\ana\Lib\site-packages\graphviz-2.38\bin\fdp.exe’, ‘sfdp’: ‘F:\ana\Lib\site-packages\graphviz-2.38\bin\sfdp.exe’}
”,那说明编译成功。yolo代码基本可以执行了。