注：本文中所有的图片均截取自原文作者的论文和讲稿。

基本信息

题目：fpgaConvNet：一个将CNN映射到FPGA上的平台
作者：Stylianos I. Venieris， Christos-Savvas Bouganis
机构：Imperial College London
发表年份：2016年
更新文章：
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs(2017)
Latency-Driven Design for FPGA-based Convolutional Neural Networks(2017)
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs(2017)
项目主页：http://cas.ee.ic.ac.uk/people/sv1310/fpgaConvNet.html
其他：该框架是作者在博士期间的成果

主要内容

核心内容

基本想法

basicidea

关键词

domain specific modelling framework 专有建模框架
automated design methodology 设计自动化方法
design space exploration 设计空间探索
synchronous data flow for capturing CNN workloads as streaming computations
domain specific language 专有描述语言

关键工作

通过将CNN的处理过程视为为一种流结构(streaming architecture)，将CNN描述为SDF(Synchronous Data Flow)模型，如图，并进行设计空间探索，设计一套转换库实现CNN模型从SDF到FPGA上的映射，最终输出可综合的Vivado HLS硬件设计。
SDFmodeling
从SDF模型到硬件building block的可配置映射有四种方法：
- 将SDFG进行拆分，每个subgraph用定制的full reconfigurable FPGA资源实现
reconfigration
- 参数化网络层的展开程度，如图
coarsefolding
finefolding
- 参数化点乘的展开程度，如上图
- 权值重加载

设计的突出特点

相比于前人基于FPGA的CNN设计及优化，该作者提出的优化和映射方法可以包含CNN的卷积层、池化层和非线性层，可以吸收所有FPGA平台的参数。

框架的处理流图

processingflow

框架结构图

frameworkconstructure

工作的更新

2016年发表的版本包括了fpgaConvNet设计的核心：基于SDF的建模和映射以及自动化设计的流程。从SDF到硬件的映射的方法有三种：SDFG划分以及FPGA资源的重配置；粗粒度的折叠，实现途径是参数化一个层(layer)的展开的程度，如果资源足够可以完全展开并行，也可以只展开一半，分两次处理；细粒度折叠，实现途径是参数化点乘的并行程度，同样是完全并行或者时分复用。注意，没有粗细力度折叠的情况下FPGA的实现性能最高。此时的fpgaConvNet面向的主要是高吞吐率的应用。
2017年发表的版本对框架进行了拓展，加入了面向低延迟的设计优化，同时可以优化大尺寸网络，例如AlexNet和VGG16。
2017年更重要的更新是引入了一个SDF转换模型：weights reloading，这种方法在不需要对输入进行batch processing的情况下还可以降低延迟。

动机

CNN是计算密集型的机器学习算法，不利于应用的推广，尤其是AI嵌入式应用；
FPGA是一种可配置的结构，可以在性能、功耗和花费上做权衡。
基于FPGA的CNN设计受到FPGA资源和规模、CNN网络种类和规模以及应用特性需求变化的影响，需要一个能够抽象FPGA资源的平台来加强基于FPGA的CNN设计的可移植性和尺度变换性。
降低深度学习专家硬件实现CNN的门槛

背景

SDF

SDF的可视化表示是有向图，每个节点代表计算，每条边代表数据流，计算节点的特点是只要数据驱动，输入数据准备好就进行计算，优点是可以对计算进行静态调度，节点间缓冲存储有限且可预测，不足是不能表示带条件的计算。另外，也可以利用SDF的可运算特性(mathematical property)加强分析。如图所示。

hardwaremapping

workloadmapping

实验对比结果

关注的参数：performance density（每一个FPGA slice上的性能，单位Gops/s/slice） and performance efficiency（每瓦特功率产生的性能，单位Gops/s/Watt）

benchmark

进化版本

进化版本增加了两个特性：
- Support for Irregular Networks
fpgaConvNet offers support for a wide range of networks, including both conventional ConvNets with regular layer connectivity as well as compound modules, such as Inception modules, residual blocks and dense blocks.
- Support for large networks
fpgaConvNet makes no assumptions on the size of ConvNets and supports the mapping of deep and wide networks independently of the target FPGA resources. This is achieved by supporting (i) bitstream-level reconfiguration which allows the mapping of ConvNets of large depth and (ii) the weights reloading of a layer which allows ConvNets to have wide convolutional layers without being constrained by the available on-chip memory. Both the reconfiguration and weights reloading employed by the generated hardware architecture are parametrised and optimised by fpgaConvNet for the target ConvNet-FPGA pair.

Paper Review: fpgaConvNet--A Framework for Mapping Convolutional Neural Networks on FPGAs