1数据需求

目标检测算法一般都是根据voc2007目录格式进行编辑，目录的框架如下图：
VOC2007
        - -Annotations
        - -ImageSets
        - -JPEGImages
将你所有的图片放入JPEGImages，但一般来说xml文件需要我们自己生成这就要编写代码如下：

import os
from PIL import Image
import cv2
out0 ='''<?xml version="1.0" encoding="utf-8"?>
<annotation>
    <folder>None</folder>
    <filename>%(name)s</filename>
    <source>
        <database>None</database>
        <annotation>None</annotation>
        <image>None</image>
        <flickrid>None</flickrid>
    </source>
    <owner>
        <flickrid>None</flickrid>
        <name>None</name>
    </owner>
    <segmented>0</segmented>
    <size>
        <width>%(width)d</width>
        <height>%(height)d</height>
        <depth>3</depth>
    </size>
'''
out1 = '''  <object>
        <name>%(class)s</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>%(xmin)d</xmin>
            <ymin>%(ymin)d</ymin>
            <xmax>%(xmax)d</xmax>
            <ymax>%(ymax)d</ymax>
        </bndbox>
    </object>
'''

out2 = '''</annotation>
'''
def txt2xml(txt_path):
    source={}
    label={}
    pic_name=txt_path.split("/")[-1][:-4]+".jpg"
    pic_path="/disk3/face_detect/beijing/sfd_head_train/head_data1/img_data/head_voc_data/JPEGImages/"+pic_name
    img=cv2.imread(pic_path)
    if img is None:
        return 0
    h,w,_=img.shape[:]
    fxml=pic_path.replace('JPEGImages','Annotations')
    fxml=fxml.replace(".jpg",".xml")
    with open(fxml,"w") as fxml1:
        image_name=pic_name
        source["name"]=image_name
        source["width"]=w
        source["height"]=h
        fxml1.write(out0%source)
        lines=[]
        with open(txt_path,"r") as f:
            lines=[i.replace("\n","") for i in f.readlines()]
        for box in lines:
            box=box.split(",")
            label["class"]="head"
            xmin=int(float(box[0]))
            ymin=int(float(box[1]))
            xmax=int(float(box[0])+float(box[2]))
            ymax=int(float(box[1])+float(box[3]))
            label["xmin"]=max(xmin,0)
            label["ymin"]=max(ymin,0)
            label["xmax"]=min(xmax,w-1)
            label["ymax"]=min(ymax,h-1)
            if label["xmin"]>=w or label["ymin"]>=h:
                continue
            if label["xmax"]<0 or label["ymax"]<0:
                continue
            fxml1.write(out1%label)
        fxml1.write(out2)
    return 1
    i=0
for txt_name in os.listdir(path):
    i=i+1
    if i%10000==0:
        print(i)
    else:
        txt_path=os.path.join(path,txt_name)
        if i%10==0:
            if (txt2xml(txt_path)==1):
                with open("/disk3/face_detect/beijing/sfd_head_train/head_data1/img_data/head_voc_data/ImageSets/Main/test.txt","a+") as ftest:
                    ftest.write(txt_name[:-4]+"\n")

        else:
            if(txt2xml(txt_path))==1:
                with open("/disk3/face_detect/beijing/sfd_head_train/head_data1/img_data/head_voc_data/ImageSets/Main/trainval.txt","a+") as ftrain:
                    ftrain.write(txt_name[:-4]+"\n")

2生成lmdb

得到上述文件夹后还需要用bash脚本进行操作，一是create_list.sh,另一个是create_data.sh。
list的作用是生成两个个txt文件trianval.txt和test_name_size.txt.trianval.txt文件里的内容如下：

JPEGImages/1562654399100016456146.jpg  Annotations/1562654399100016456146.xml

test_name_size.txt文件内容如下：

1562654399100016456146 h  w

个人不是很熟悉bash语法，所以用python代替了。
下边的creat_dara.sh其实就是调用了scripts/create_annoset.py函数，往其中传递一些参数。

caffe_root=/disk3/face_detect/caffe_s3fd-ssd
root_dir=/disk3/face_detect/beijing/sfd_head_train
LINK_DIR=$root_dir/head_data1/lmdb_data1
cd $root_dir
redo=1
db_dir="$root_dir/head_data1/lmdb_data"
data_root_dir="$root_dir/head_data1/img_data/head_voc_data"
dataset_name="trian"
mapfile="/disk3/face_detect/beijing/labelmap_head.prototxt"
anno_type="detection"
db="lmdb"
min_dim=0
max_dim=0
width=0
height=0

extra_cmd="--encode-type=jpg --encoded"
if [ $redo ]
then
  extra_cmd="$extra_cmd --redo"
fi
for subset in trainval
do
  python $caffe_root/scripts/create_annoset.py --anno-type=$anno_type --label-map-file=$mapfile --min-dim=$min_dim --max-dim=$max_dim --resize-width=$width --resize-height=$height --check-label $extra_cmd $data_root_dir /disk3/trianval.txt $db_dir/$db/$dataset_name"_"$subset"_"$db $LINK_DIR/$dataset_name
done

3训练

训练没有什么好说的主要是solover.prototxt和trainval.prototxt。
首先solover.prototxt的讲解参考solover参数、优化器
trianval.prototxr即使网络的连接方式，里面的层很多，不一一列举了遇到了不会的goole就可以。要修改网络结构的话一定要读懂论文。推荐一个可视化模型的软件Netron

小涵涵

发布了13 篇原创文章 · 获赞 1 · 访问量 586

私信关注

目标检测算法caffe训练代码总结

1数据需求

2生成lmdb

3训练

猜你喜欢