文章目录

Network
CRF
- Road Detection through CRF based LiDAR-Camera Fusion
- Road detection based on the fusion of Lidar and image data
其他
- Autonomous road detection and modeling for UGVs using vision-laser data fusion
总结

本博客列举一些激光雷达-相机融合的方法，主要是针对Road Detection问题的。其实Road Detection是属于Semantic Segmentation问题的，只是需要划分的就是两类，一个是道路，一个是非道路。目前主流方法当然是CNN，需要大量的数据用来训练。另外还有就是使用CRF。

Network

SNE-RoadSeg

文章：SNE-RoadSeg: Incorporating Surface Normal Information into Semantic Segmentation for Accurate Freespace Detection
发表：ECCV， 2020

文章的具体解读可以看我另一篇博客。这里就简述一下其方法：
在这里插入图片描述
网络的输入是两种图，一种是RGB，另一种是Depth。文章中并没有直接用Depth，而是通过SNE的变换用Depth图提取了pixel-wise的法向量。然后用两个ResNet做Encoder，Decoder是用DenseNet的形式构建的。

虽然本文不是LiDAR和Camera的Fusion，但是Kitti不提供Depth Image，肯定是通过某种方法用LiDAR的数据恢复出来的，文章中没细说。

LidCamNet

文章：LIDAR–camera fusion for road detection using fully convolutional neural networks
发表：Robotics and Autonomous Systems， 2019

该文章提出的方法首先是将LiDAR获取的Point Cloud转为Depth Image，具体的方法可参考文章中的Section 4，思路仍然是对齐LiDAR和Camera的数据，然后把LiDAR投影上去，最后再插值/补齐空洞得到稠密的Depth Image。文章中给出了他们做的引文：Pedestrian detection combining rgb and dense lidar data, in: Intelligent Robots and Systems (IROS 2014)。得到的效果图如下：
在这里插入图片描述
然后来介绍一下网络，网络也非常简单：

Encoder和Decoder就是FCN，其中L6-L14是3x3的Dilated Convolution Layer。本文提出来Cross Fusion来融合两种数据：

也就是说，其实每个尺度的RGB和Depth的特征图都做了学习权重的pixel-wise addition。

Road segmentation with image-LiDAR data fusion in deep neural network

文章：Road segmentation with image-LiDAR data fusion in deep neural network
发表：Multimedia Tools and Applications，2019

本方法仍然是将LiDAR投影到Image上做的。具体网络如下：
在这里插入图片描述
使用ResNet-50做Encoder，得到1/4~1/32的feature map。然后再使用多个RFU来融合LiDAR数据和上采样。RFU具体见下图：

可以看到，RFU有两个作用：1）融合低分辨率的高分辨的图像，2）融合同分辨率的来自image和lidar的feature map。其中image feature maps是由Encoder输出的，LiDAR points projection则是通过投影进，然后缩放得到的。也就是说，其实是用LiDAR points投影得到了一个深度图，然后对深度图做了图像金字塔。

本文区别于其他方法在于，对LiDAR points投影之后的图像做了图像金字塔，然后加入到了Decoder中。

Fast Lidar-Camera Fusion for Road Detection by CNN and Spherical Coordinate Transformation

文章：Fast Lidar - Camera Fusion for Road Detection by CNN and Spherical Coordinate Transformation
发表：IEEE Intelligent Vehicles Symposium (IV)，2019

本文宣称的创新点在于如下：
However, this method retains large quantities of space from the lidar data—the gaps caused by distance constraints. To solve these problems, we converged lidar and camera image into a spherical coordinate system.
把将point cloud投影到image平面转为了投影到一个球坐标系中。然后将image也投影到这个坐标系中。先看image是如何投影的：
在这里插入图片描述
$x_m^c$ 和 $y_m^c$ 分别是像素坐标， $f$ 是相机焦距。可以看到，一个像素有一个固定的 $x_m^c, y_m^c)$ ，就可以换算一组固定的球坐标系的坐标。那像素的分辨率也没有改，那这图了个啥。从原理上来讲，其实这就是把等间距的pixel变换成了不等间距的。但我认为也没什么优势。

网络就是Encoder+Decoder，也没啥创新。

Multi-Stage Residual Fusion Network for LIDAR-Camera Road Detection

文章：Multi-Stage Residual Fusion Network for LIDAR-Camera Road Detection
发表：IEEE Intelligent Vehicles Symposium (IV)，2019

在这里插入图片描述
本文是在俯视图中做分割。将point cloud栅格化后，将高度当做通道。将Image投影到俯视图中。然后用ResNet进行分割。

A fusion network for road detection via spatial propagation and spatial transformation

文章：A fusion network for road detection via spatial propagation and spatial transformation
发表：Pattern Recognition，2020
RNN的方法

3-D LiDAR + Monocular Camera: An Inverse-Depth-Induced Fusion Framework for Urban Road Detection

文章：3-D LiDAR + Monocular Camera: An Inverse-Depth-Induced Fusion Framework for Urban Road Detection
发表：IEEE TRANSACTIONS ON INTELLIGENT VEHICLES，2018

仍然是使用Depth image，使用CRF

CRF

Road Detection through CRF based LiDAR-Camera Fusion

文章：Road Detection through CRF based LiDAR-Camera Fusion
发表：ICRA，2019

本文是用了CRF进行融合的一篇文章，说白了就是用了两个语义分割器，分别对LiDAR数据和Camera数据进行分割，然后用CRF融合。

LiDAR的分割器仍然是先把LiDAR的数据按照x，y，z方向投影到三个2维数组中，然后用文中参考文献【2】的方法进行分割。将分割结果投影到Image坐标系中，用Upsampling的方法得到稠密的分割结果。

Camera的分割器就是DeepLab。

融合的CRF这一块思路也很简单，Unary Potential就是把两个分割器输出的结果加权求个和，Pairwise Potential就是在RGB图像上做个高斯核。创新的一块是，在Binary和Pairwise中，还考虑了LiDAR和RGB两个分割器分割的结果相同不相同：
在这里插入图片描述

文中使用label compatibility function来描述这个分割结果的向同性，就是下面这个函数。

Road detection based on the fusion of Lidar and image data

文章：Road detection based on the fusion of Lidar and image data
发表：International Journal of Advanced Robotic Systems，2017

文章主体是使用CRF做融合，首先把激光雷达点云通过投影然后双边滤波器转为Height Image。

Unary Potential：使用Adaboost做分类器。每个pixel的的feature由三部分构成。1）用Filter Blank提取RGB的feautre，2）提出了LLD提取点云中的feature，这个其实就是以每个点做邻域，求分布直方图，3）用像素位置当做location feature。有激光雷达点的像素，用Adaboost分类得到probability，没有的话直接将probability设置为0.5

Pairwise Potential：对RGB Image和Height Image用高斯核来得到两组Pairwise Potential，然后加权得到最终的Pairwise Potential

文章还对比了使用Height Image和使用光照不变图像的效果，Height的效果优于光照不变性图像.。

其他

Autonomous road detection and modeling for UGVs using vision-laser data fusion

文章：Autonomous road detection and modeling for UGVs using vision-laser data fusion
发表：Neurocomputing，2018

总结

目前看到的文章均是把Point Cloud变成了二维，最多的就是变成了Depth Image然后处理。

读者要是有什么论文推荐，可以加到评论中。

【论文阅读】【综述】激光雷达-相机融合的道路检测方法