

主题:暗光实例分割 Instance Segmentation in the Dark


作者:Linwei Chen· Ying Fu · Kaixuan Wei· Dezhi Zheng· Felix Heide





本论文收集制作了一个Low-light Instance Segmentation (LIS) 数据集,它包括低光、正常曝光,下配对的JPEG和RAW的四组数据,并且提供8类实例像素标注,可以用于实例分割、目标检测任务。(新数据集!

本论文观察到,RAW图像具有比JPEG图像更好的潜力来实现更高实例分割精度,并且作者进一步分析,这和RAW能提供更多比特位深的信息有关(RAW is all you need!)。

本论文观察到,暗光条件下,图像噪声会对深度神经网络中的特征造成高频扰动,这是导致现有实例分割方法在暗光条件下表现不好的一个重要原因(Noise is the key!)。

效果如何?使用该论文的方法框架,基于Mask R-CNN-ResNet50的结果,和在大量数据上训练的大模型Segment Anything相比,该论文提出的方法仍然表现出色。


Existing instance segmentation techniques are primarily tailored for high-visibility inputs, but their performance significantly deteriorates in extremely low-light environments. In this work, we take a deep look at instance segmentation in the dark and introduce several techniques that substantially boost the low-light inference accuracy. The proposed method is motivated by the observation that noise in low-light images introduces high-frequency disturbances to the feature maps of neural networks, thereby significantly degrading performance. To suppress this “feature noise”, we propose a novel learning method that relies on an adaptive weighted downsampling layer, a smooth-oriented convolutional block, and disturbance suppression learning. These components effectively reduce feature noise during downsampling and convolution operations, enabling the model to learn disturbance-invariant features. Furthermore, we discover that high-bit-depth RAW images can better preserve richer scene information in low-light conditions compared to typical camera sRGB outputs, thus supporting the use of RAW-input algorithms. Our analysis indicates that high bit-depth can be critical for low-light instance segmentation. To mitigate the scarcity of annotated RAW datasets, we leverage a low-light RAW synthetic pipeline to generate realistic low- light data. In addition, to facilitate further research in this direction, we capture a real-world low-light instance segmentation dataset comprising over two thousand paired low/normal-light images with instance-level pixel-wise annotations. Remarkably, without any image preprocessing, we achieve satisfactory performance on instance segmentation in very low light (4% AP higher than state-of-the-art competitors), meanwhile opening new opportunities for future research. Our code and dataset are publicly available to the community (https://github.com/Linwei-Chen/LIS).



a. 低光下的特征图退化。对于清晰的正常光图像,实例分割网络能够清晰地捕捉浅层和深层中对象的低级(例如,边缘)和高级(即,语义响应)特征。然而,在嘈杂的低光图像中,浅层特征可能会受到污染并充满噪声,而深层特征对对象的语义响应较低。

b. 在黑暗中相机的sRGB输出和RAW图像之间的比较。由于信噪比显著降低,8位相机输出失去了许多场景信息,例如,座椅靠背结构在相机输出中几乎无法辨认,而在RAW图像中仍然可以识别(放大以获得更好的细节)。

Illustration of our key observations under dark regimes that drive our method design: a Degraded feature maps under low light. For clean normal-light images, the instance segmentation network is able to clearly capture the low-level (e.g., edges) and high-level (i.e., semantic responses) features of objects in shallow and deep layers, respectively. However, for noisy low-light images, shallow features can be corrupted and full of noise, and the deep features show lower semantic responses to objects. b Comparison between camera sRGB output and RAW image in the dark. Due to significantly low SNR, the 8-bit camera output loses much of the scene information, for example, the seat backrest structure is barely discernible, whereas is still recognizable in the RAW counter- part (Zoom in for better details)



Low-Light RAW Synthetic Pipeline






Our low-light RAW synthetic pipeline consists of two steps, i.e., unproccessing and noise injection. We introduce them one by one.
Unprocessing. Collecting a large-scale RAW image dataset is expensive and time-consuming, hence we consider utilizing existing sRGB image datasets (Everingham et al., 2010; Lin et al., 2014). The sRGB image is obtained from RAW images by a series of image transformations of on-camera image signal processing (ISP), e.g., tone mapping, gamma correction, color correction, white balance, and demosaicking. With the help of the unprocessing operation (Brooks et al., 2019), we can invert these image processing transforma- tions, and RAW images can be obtained. In this way, we can create a RAW dataset with zero cost.
Noise injection. After obtaining clean RAW images by unprocessing, to simulate real noisy low-light images, we need to inject noise into RAW images. To yield more accurate results for real complex noise, we employ a recently proposed physics-based noise model (Wei et al., 2020, 2021), instead of the widely used Poissonian-Gaussian noise model (i.e., heteroscedastic Gaussian model (Foi et al., 2008)). It can accurately characterize the real noise structures bytakingg into account many noise sources, including photon shot noise, read noise, banding pattern noise, and quantization noise.

Adaptive Weighted Downsampling Layer


考虑到现有网络通常有多个下采样层,那么岂不是可以充分利用这些下采样过程?实验证明简单插入mean filter就能几乎白嫖到暗光实例分割的性能。虽然有效,但mean filter之类的固定滤波无法根据特征自适应调整,从而有可能抹除了细节信息,对此作者提出Adaptive Weighted Downsampling Layer,AWD,自适应对特征逐通道逐点预测低通滤波,从而对噪声区域加大力度低通,而细节区域降低低通水平保留细节看了源码FC是用Depth-wise替代的,效果等价。公式就不列了,感兴趣可以看原文。

Smooth-Oriented Convolutional Block


Disturbance Suppression Learning

同时作者还在模型学习的时候做了一些调整,让模型同时学习干净和带噪图像,并约束输入为带噪图像时的特征更接近干净图像,有点类似知识蒸馏,但无需teacher。这样使得模型不仅提高了暗光下的鲁棒性,同时对光照正常的图像也有提升,这非常符合实际应用情景,即一个模型同时应对白天和黑夜的下的应用。LIS数据集数据集由Canon EOS 5D Mark IV拍摄,具有如下特性:

  • 配对样本。在LIS数据集中,我们提供了sRGB-JPEG(典型相机输出)和RAW格式的图像,每种格式都包括配对的短曝光低光和相应的长曝光正常光图像。我们将这四种类型的图像称为sRGB-暗、sRGB-正常、RAW-暗和RAW-正常。为了确保它们在像素级别对齐,我们将相机安装在坚固的三脚架上,并通过手机应用程序远程控制以避免振动。

  • 多样化场景。LIS数据集包括2230对图像,这些图像在各种场景中收集,包括室内和室外。为了增加低光条件的多样性,我们使用一系列ISO级别(例如800、1600、3200、6400)拍摄长曝光参考图像,并故意通过一系列低光因素(例如10、20、30、40、50、100)减少曝光时间,拍摄短曝光图像,以模拟非常低光条件。   whaosoft aiot http://143ai.com  

  • 实例级像素级标签。对于每一对图像,我们提供精确的实例级像素级标签,标注了我们日常生活中8个最常见的物体类别的实例(自行车、汽车、摩托车、公共汽车、瓶子、椅子、餐桌、电视)。我们注意到LIS包含在不同场景(室内和室外)以及不同光照条件下拍摄的图像。在图7中,物体遮挡和密集分布的物体使LIS在低光条件之外更具挑战性。

AblationMain results

提供了使用Mask R-CNN、PointRend、Mask2Former、Faster R-CNN方法以及主干网络ResNet-50、Swin-Transformer、ConvNeXt在实例分割、目标检测两个任务上证明了有效性: 可视化结果 启发


