1数据需求
目标检测算法一般都是根据voc2007目录格式进行编辑,目录的框架如下图:
VOC2007
- -Annotations
- -ImageSets
- -JPEGImages
将你所有的图片放入JPEGImages,但一般来说xml文件需要我们自己生成这就要编写代码如下:
import os
from PIL import Image
import cv2
out0 ='''<?xml version="1.0" encoding="utf-8"?>
<annotation>
<folder>None</folder>
<filename>%(name)s</filename>
<source>
<database>None</database>
<annotation>None</annotation>
<image>None</image>
<flickrid>None</flickrid>
</source>
<owner>
<flickrid>None</flickrid>
<name>None</name>
</owner>
<segmented>0</segmented>
<size>
<width>%(width)d</width>
<height>%(height)d</height>
<depth>3</depth>
</size>
'''
out1 = ''' <object>
<name>%(class)s</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<difficult>0</difficult>
<bndbox>
<xmin>%(xmin)d</xmin>
<ymin>%(ymin)d</ymin>
<xmax>%(xmax)d</xmax>
<ymax>%(ymax)d</ymax>
</bndbox>
</object>
'''
out2 = '''</annotation>
'''
def txt2xml(txt_path):
source={}
label={}
pic_name=txt_path.split("/")[-1][:-4]+".jpg"
pic_path="/disk3/face_detect/beijing/sfd_head_train/head_data1/img_data/head_voc_data/JPEGImages/"+pic_name
img=cv2.imread(pic_path)
if img is None:
return 0
h,w,_=img.shape[:]
fxml=pic_path.replace('JPEGImages','Annotations')
fxml=fxml.replace(".jpg",".xml")
with open(fxml,"w") as fxml1:
image_name=pic_name
source["name"]=image_name
source["width"]=w
source["height"]=h
fxml1.write(out0%source)
lines=[]
with open(txt_path,"r") as f:
lines=[i.replace("\n","") for i in f.readlines()]
for box in lines:
box=box.split(",")
label["class"]="head"
xmin=int(float(box[0]))
ymin=int(float(box[1]))
xmax=int(float(box[0])+float(box[2]))
ymax=int(float(box[1])+float(box[3]))
label["xmin"]=max(xmin,0)
label["ymin"]=max(ymin,0)
label["xmax"]=min(xmax,w-1)
label["ymax"]=min(ymax,h-1)
if label["xmin"]>=w or label["ymin"]>=h:
continue
if label["xmax"]<0 or label["ymax"]<0:
continue
fxml1.write(out1%label)
fxml1.write(out2)
return 1
i=0
for txt_name in os.listdir(path):
i=i+1
if i%10000==0:
print(i)
else:
txt_path=os.path.join(path,txt_name)
if i%10==0:
if (txt2xml(txt_path)==1):
with open("/disk3/face_detect/beijing/sfd_head_train/head_data1/img_data/head_voc_data/ImageSets/Main/test.txt","a+") as ftest:
ftest.write(txt_name[:-4]+"\n")
else:
if(txt2xml(txt_path))==1:
with open("/disk3/face_detect/beijing/sfd_head_train/head_data1/img_data/head_voc_data/ImageSets/Main/trainval.txt","a+") as ftrain:
ftrain.write(txt_name[:-4]+"\n")
2生成lmdb
得到上述文件夹后还需要用bash脚本进行操作,一是create_list.sh,另一个是create_data.sh。
list的作用是生成两个个txt文件trianval.txt和test_name_size.txt.trianval.txt文件里的内容如下:
JPEGImages/1562654399100016456146.jpg Annotations/1562654399100016456146.xml
test_name_size.txt文件内容如下:
1562654399100016456146 h w
个人不是很熟悉bash语法,所以用python代替了。
下边的creat_dara.sh其实就是调用了scripts/create_annoset.py函数,往其中传递一些参数。
caffe_root=/disk3/face_detect/caffe_s3fd-ssd
root_dir=/disk3/face_detect/beijing/sfd_head_train
LINK_DIR=$root_dir/head_data1/lmdb_data1
cd $root_dir
redo=1
db_dir="$root_dir/head_data1/lmdb_data"
data_root_dir="$root_dir/head_data1/img_data/head_voc_data"
dataset_name="trian"
mapfile="/disk3/face_detect/beijing/labelmap_head.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0
extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
extra_cmd="$extra_cmd --redo"
fi
for subset in trainval
do
python $caffe_root/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir /disk3/trianval.txt $db_dir/$db/$dataset_name"_"$subset"_"$db $LINK_DIR/$dataset_name
done
3训练
训练没有什么好说的主要是solover.prototxt和trainval.prototxt。
首先solover.prototxt的讲解参考solover参数、优化器
trianval.prototxr即使网络的连接方式,里面的层很多,不一一列举了遇到了不会的goole就可以。要修改网络结构的话一定要读懂论文。推荐一个可视化模型的软件Netron