利用Caffe进行多标签影像数据训练时,主要有两种方法:
1. 对caffe源码进行修改,修改convert_imageset.cpp文件支持多标签,具体步骤可以参考https://www.jianshu.com/p/fdf7c599ab9d
2. 利用HDF5数据和Slice层进行多标签数据训练,本文主要介绍本方法。
- 制造hdf5数据
首先将图像数据和标签保存到TXT文件中,本文为图像路径 label_1 label_2形式如下
114841417@N06/coarse_tilt_aligned_face.492.12059747423_6b3535aa6a_o.jpg 3 0
8187011@N06/coarse_tilt_aligned_face.992.10353206665_5173637857_o.jpg 4 1
48647239@N03/coarse_tilt_aligned_face.1515.11838493313_5b7240b1c9_o.jpg 3 1
100003415@N08/coarse_tilt_aligned_face.2176.9523981569_ea255870f1_o.jpg 4 0
7464014@N04/coarse_tilt_aligned_face.965.10107710156_9cb48097c5_o.jpg 4 0
31183835@N08/coarse_tilt_aligned_face.2096.8754898174_5d34522d9a_o.jpg 4 1
63164355@N03/coarse_tilt_aligned_face.1082.8826664078_de8f6c6a9e_o.jpg 3 1
111700049@N08/coarse_tilt_aligned_face.1548.11833465006_b9235b0c89_o.jpg 5 0
63164355@N03/coarse_tilt_aligned_face.1111.11014305184_25bc533930_o.jpg 5 0
7398884@N04/coarse_tilt_aligned_face.1641.8727032370_97ab4ee179_o.jpg 3 0
10280355@N07/coarse_tilt_aligned_face.1880.9496762548_754e1337d6_o.jpg 6 1
33627988@N04/coarse_tilt_aligned_face.1949.8809482906_7021c9794c_o.jpg 7 1
64504106@N06/coarse_tilt_aligned_face.911.11846581226_fc9f42d681_o.jpg 0 0
112599447@N03/coarse_tilt_aligned_face.1201.11576030294_cf8d7137a6_o.jpg 5 1
对数据和标签进行编辑,生成hdf5数据:
-
#影像文件夹所在目录
-
img_root =
'./image'
-
#训练数据txt路径
-
train_path =
'./train.txt'
-
#输出路径
-
train_out =
'./hdf5_train'
-
-
#将txt中的数据存入
-
with open(train_path)
as f:
-
lines = f.readlines()
-
-
file_list = []
#存入影像路径
-
#建立标签和数据数组
-
#若要生成hdf5数据,必须先把影像和标签变为数组
-
#本文标签数目为2,影像数据:channel = 3,width = 256,height = 256故生成如下形式数据
-
labels = np.zeros((len(lines),
2)).astype(np.int)
-
datas = np.zeros((len(lines),
3,
256,
256)).astype(np.float32)
-
#读取数据
-
count =
0
-
for line
in lines:
-
file_list.append(line.split()[
0])
-
labels[count][
0] = line.split()[
1]
-
labels[count][
1] = line.split()[
2]
-
count +=
1
-
f.close()
-
-
#caffe利用hfd5数据时,在输入层没有transform_param 参数,所以需要先对影像数据进行预处理
-
for i, file
in enumerate(file_list):
-
path = os.path.join(img_root,file)
-
image = cv.imread(path)
#获取影像
-
image = cv.resize(image,(
256,
256))
#重采样为256*256大小的图像
-
img = np.array(image)
-
img = img.transpose(
2,
0,
1)
#讲图像从宽 高 通道 形式转化为通道 宽 高 caffe读取图像形式
-
datas[i, :, :, :] = img.astype(np.float32)
#hdf5要求数据为float或double形式
-
#获取影像均值
-
mean = datas.mean(axis=
0)
-
mean = mean.mean(
1).mean(
1)
-
#将影像减去均值
-
for i
in range(len(datas)):
-
datas[i][
0] = datas[i][
0] - mean[
0]
-
datas[i][
1] = datas[i][
1] - mean[
1]
-
datas[i][
2] = datas[i][
2] - mean[
2]
-
#保存hdf5文件
-
with h5py.File(train_out,
'w')
as fout:
-
#'data'必须和train_val.prototxt文件里数据层中top:后边的名称一致,在修改prototxt文件时会进一步说明
-
fout.create_dataset(
'data',data = datas)
-
fout.create_dataset(
'label', data=labels)
-
fout.close()
注意:1. caffe中获取hyd5文件时,需要把所有数据读入内存中,所以当数据量很大时,需要将其分成多份保存,每份最好不大于2GB
2.在train_val.prototxt文件中,hdf5_data_param的source应该为保存hdf5数据路径的txt文件,不能直接读取hdf5数据
3.HDF5Data layer没有transform_param参数,所以需要在生成hdf5数据之前对影像数据进行相应的预处理
- 修改train_val.prototxt文件
-
name:
"CaffeNet"
-
layers {
-
name:
"data"
-
type: HDF5_DATA
-
top:
"data"
#在生成hdf5文件时,数据和标签名称一定要和top之后的名称一致
-
top:
"label"
-
hdf5_data_param {
-
source:
"/train_list.txt"
-
batch_size:
64
-
}
-
include: { phase: TRAIN }
-
}
-
layers {
-
name:
"data"
-
type: HDF5_DATA
-
top:
"data"
-
top:
"label"
-
hdf5_data_param {
-
source:
"/test_list.txt"
-
batch_size:
64
-
}
-
include: { phase: TEST }
-
}
-
-
#添加slices层,对标签进行划分
-
layers {
-
name:
"slices"
-
type: SLICE
-
bottom:
"label"
-
top:
"label_1"
#有几个标签,就建立几个top
-
top:
"label_2"
-
slice_param{
-
axis:
1
#axis表示轴,用来确定数据是按照num还是channel来划分,此处表示利用channel来划分
-
slice_point:
1
#slice_point数目等于label数目减1,本文为2个标签,所以划分一次即可
-
#slice_point: 2 有三个标签,则增加slice_point层即可
-
}
-
}
-
-
-
#卷积池化等层没做修改,在此处省略
-
-
#修改fc8 accuracy和loss层
-
-
layers {
-
name:
"fc8_age"
-
type: INNER_PRODUCT
-
bottom:
"fc7"
-
top:
"fc8_age"
-
inner_product_param {
-
num_output:
8
#对应第一个标签输出
-
}
-
}
-
layers {
-
name:
"accuracy1"
-
type: ACCURACY
-
bottom:
"fc8_age"
-
bottom:
"label_1"
#标签1 slices层划分的
-
top:
"accuracy1"
-
include: { phase: TEST }
-
}
-
layers {
-
name:
"loss_age"
-
type: SOFTMAX_LOSS
-
bottom:
"fc8_age"
-
bottom:
"label_1"
#label_1
-
top:
"loss_age"
-
}
-
-
-
layers {
-
name:
"fc8_gender"
-
type: INNER_PRODUCT
-
bottom:
"fc7"
-
top:
"fc8_gender"
-
inner_product_param {
-
num_output:
2
#label_2的输出
-
}
-
}
-
layers {
-
name:
"accuracy2"
-
type: ACCURACY
-
bottom:
"fc8_gender"
-
bottom:
"label_2"
#对应于label_2
-
top:
"accuracy2"
-
include: { phase: TEST }
-
}
-
layers {
-
name:
"loss_gender"
-
type: SOFTMAX_LOSS
-
bottom:
"fc8_gender"
-
bottom:
"label_2"
#对应于label_2
-
top:
"loss_gender"
-
}
-
-
-
-
-
-
- 修改deploy.prototxt
-
#只修改fc8和prob层即可
-
layers {
-
name:
"fc8_age"
-
type: INNER_PRODUCT
-
bottom:
"fc7"
-
top:
"fc8_age"
-
inner_product_param {
-
num_output:
8
-
}
-
}
-
layers {
-
name:
"prob1"
-
type: SOFTMAX
-
bottom:
"fc8_age"
-
top:
"prob1"
-
}
-
layers {
-
name:
"fc8_gender"
-
type: INNER_PRODUCT
-
bottom:
"fc7"
-
top:
"fc8_gender"
-
inner_product_param {
-
num_output:
2
-
}
-
}
-
layers {
-
name:
"prob2"
-
type: SOFTMAX
-
bottom:
"fc8_gender"
-
top:
"prob2"
-
}
利用Caffe进行多标签影像数据训练时,主要有两种方法: