上一篇博文讲到用tiny-dnn实现MNIST手写数字识别,https://blog.csdn.net/flyconley/article/details/82864507,运行了tiny-dnn的官方demo。实际工程部署中,需要针对自己的数据进行训练完成分类任务,本篇博文主要是阅读分析该代码,在其基础上修改以训练自己的数据。
这里我们以识别印刷体汉字为例,首先我们收集相关印刷体汉字图片,共计识别20余种汉字,每种汉字图片共计8000张
下面开始修改代码,这一段为官方demo中的网络结构定义部分代码:
// construct LeNet-5 architecture
network<sequential> nn;
adagrad optimizer;
// connection table [Y.Lecun, 1998 Table.1] 定义lenet-5网络结构
#define O true
#define X false
static const bool connection[] = {
O, X, X, X, O, O, O, X, X, O, O, O, O, X, O, O,
O, O, X, X, X, O, O, O, X, X, O, O, O, O, X, O,
O, O, O, X, X, X, O, O, O, X, X, O, X, O, O, O,
X, O, O, O, X, X, O, O, O, O, X, X, O, X, O, O,
X, X, O, O, O, X, X, O, O, O, O, X, O, O, X, O,
X, X, X, O, O, O, X, X, O, O, O, O, X, O, O, O
};
#undef O
#undef X
nn << convolutional_layer<tan_h>(
32, 32, 5, 1, 6) /* 32x32 in, 5x5 kernel, 1-6 fmaps conv */
<< average_pooling_layer<tan_h>(
28, 28, 6, 2) /* 28x28 in, 6 fmaps, 2x2 subsampling */
<< convolutional_layer<tan_h>(
14, 14, 5, 6, 16, connection_table(connection, 6, 16))
<< average_pooling_layer<tan_h>(10, 10, 16, 2)
<< convolutional_layer<tan_h>(5, 5, 5, 16, 120)
<< fully_connected_layer<tan_h>(120, 10);
它实现了一个lenet-5网络结构,相关知识可参见博文https://blog.csdn.net/happyorg/article/details/78274066
上述代码中connection部分是lenet-5的特殊连接结构,这里我们将它删去,并将第一个卷积层的卷积核数目改为8,将最后全连接层的输出个数改为22,修改后代码如下:
nn << convolutional_layer<tan_h>(
32, 32, 5, 1, 8) //32x32 in, 5x5 kernel, 1-6 fmaps conv
<< average_pooling_layer<tan_h>(
28, 28, 8, 2) // 28x28 in, 6 fmaps, 2x2 subsampling
<< convolutional_layer<tan_h>(
14, 14, 5, 8, 16)
<< average_pooling_layer<tan_h>(10, 10, 16, 2)
<< convolutional_layer<tan_h>(5, 5, 5, 16, 192)
<< fully_connected_layer<tan_h>(192, 22);
接下来我们我们编写代码读取训练和测试样本:
// load dataset
std::vector<label_t> train_labels, test_labels;
std::vector<vec_t> train_images, test_images;
在tiny-dnn中数据图像以<vec_t>格式存储,数据标签以<label_t>格式存储,我们用下面代码读取图像并将其转化为其要求的格式:
// convert image to vec_t
void convert_image(const string& imagefilename, double scale, int w, int h, std::vector<vec_t>& data)
{
auto img = cv::imread(imagefilename, cv::IMREAD_GRAYSCALE);
if (img.data == nullptr) return; // cannot open, or it's not an image
//imshow("img", img);
//cvWaitKey(0);
cv::Mat_<uint8_t> resized;
cv::resize(img, resized, cv::Size(w, h));
vec_t d;
double minv = -1.0;
double maxv = 1.0;
std::transform(resized.begin(), resized.end(), std::back_inserter(d),
[=](uint8_t c) { return (255 - c) * (maxv - minv) / 255.0 + minv; });
data.push_back(d);
}
读取训练集,同样方法可以读取测试集:
char ImageRootDir[] = "E:\\mystudy\\tinycnn\\traindata\\";
for (size_t i = 0; i <= 21; i++) {
string trainrootpath = ImageRootDir + cv::format("%d", i); //训练集路径
for (int j = 0; j < 5000; j++)
{
string trainpath = trainrootpath + "\\" + cv::format("%d", j) + ".jpg";
convert_image(trainpath, 1.0, 32, 32, train_images);
train_labels.push_back((label_t)i);
}//转换图像格式
}
这样方式读取的数据集是完全顺序的,在训练之前需要进行数据打乱操作,否则会导致训练结果很难收敛。我们编写shuffle数据集的代码:
void shuffle(std::vector<vec_t>& images, std::vector<label_t>& labels)
{
unsigned seed = chrono::system_clock::now().time_since_epoch().count();
int n = images.size();
for (int i = n - 1; i>0; --i)
{
std::uniform_int_distribution<decltype(i)> d(0, i);
int r = d(default_random_engine(seed));
std::swap(labels[i], labels[r]);
std::swap(images[i], images[r]);
}
}
在训练过程中插入该代码,其他训练参数按照示例demo不做改变:
shuffle(train_images, train_labels);
std::cout << "start learning" << std::endl;
progress_display disp(train_images.size());
timer t;
int minibatch_size = 10;
optimizer.alpha *= std::sqrt(minibatch_size);
// create callback
auto on_enumerate_epoch = [&](){
std::cout << t.elapsed() << "s elapsed." << std::endl;
tiny_dnn::result res = nn.test(test_images, test_labels);
std::cout << res.num_success << "/" << res.num_total << std::endl;
shuffle(train_images, train_labels);
std::ofstream ofs("chi-weights");
ofs << nn;
disp.restart(train_images.size());
t.restart();
};
auto on_enumerate_minibatch = [&](){
disp += minibatch_size;
};
// training
nn.train<mse>(optimizer, train_images, train_labels, minibatch_size, 20,
on_enumerate_minibatch, on_enumerate_epoch);
std::cout << "end training." << std::endl;
// test and show results
nn.test(test_images, test_labels).print_detail(std::cout);
// save networks
std::ofstream ofs("numplusletter-weights");
ofs << nn;
运行代码,得到训练结果:
可以看到经过一次迭代就收敛,测试集识别率就达到100%,这是因为训练集样本较多,共计110000个样本,同时训练集和测试集本身相似度较大所致。