论文

论文题目：URBAN CHANGE DETECTION FOR MULTISPECTRAL EARTH OBSERV ATIONUSING CONVOLUTIONAL NEURAL NETWORK

发表于：CVPR 2018

论文代码：https://github.com/rcdaudt/patch_based_change_detection

论文使用CNN进行变化检测，主要用了两个方法：一个是将来自两张图像的patch在通道维度上拼接起来送入网络（EF network）；另一个是孪生网络（Siam network），使用两条共享权重的并行支路来分别处理来自两张图像的patch。最后用两个全连接层对每个像素点做一个分类，change或no change。作者使用大步幅来提取图像的patch，再用一个投票系统预测整张图像的像素点标签。也就得到了整张图中的变化区域。

摘要

论文探讨卷积神经网络CNN对使用多光谱图像进行城市的变化检测；
提出像素级标注的变化检测数据集 Onera Satellite Change Detection (OSCD) dataset；
提出两种用于变化检测的网络架构，Siamese 和 Early Fusion，并比较不同光谱通道数作为输入的影响。

1 介绍

变化检测，包括比较同一区域的一对注册图像，并识别出变化的部分。为每个像素分配一个标签：change or no change（变化或未变化）。

问题

由于缺乏大量的有标记的训练数据，一些基于迁移学习的方法虽被提出用于规避这一问题，但仍有所限制：CNN网络使用3通道，多光谱图像可能有13个通道，大部分通道信息无法得到利用；大多方法是为生成人工阈值化的差分图像，没有使用端到端的训练，效果受到限制。

贡献

开发了一个由图像对和像素级标签组成的城市变化检测数据集 OSCD dataset。
提出了两种不同CNN架构，旨在以完全监督的方式从该数据集中学习端到端的变化检测。

2 数据集

OSCD数据集专注于城市区域，仅将城市发展变化标注为change，自然变化标注为no change。

挑战和限制

卫星生成的图像分辨率较低，对大建筑效果较好，但一些小的变化不明显（现有建筑的扩展或现有道路的增加）

变化与未变化比例差距大。图像对间隔时间较短，图像包含标记为未变化的像素，比标记为变化的像素多很多。

3 变化检测方法

目的在于将有监督的深度学习方法用于变化检测问题。仅在OSCD数据集对其进行训练，与以往使用CNN方法（build difference images which are later thresholded）不同，本文使用端到端训练，将patch分为两类：变化与非变化。

patch大小为15*15像素，网络试图根据其领域的值对中心像素的标签进行二分类：change 或no change，作为这个patch块中心像素的类别。

这比计算图像间的差异要简单多，因为它涉及到变化的语义解释，会更加复杂。

架构

架构灵感来源于《Learning to Compare Image Patches via Convolutional Neural Networks》。

网络以15*15*C的patch作为输入，其中C是颜色的通道数。网络中每对patch的输出是一对值，表示patch属于每个类别的概率的估计（二分类问题）。通过选择这两个值的最大值，能够预测patch的中心像素是否发生了变化。

还可以将二分类的阈值设置为0.5之外的值（ in case false positives or false negatives are more or less costly in a given application ）。

Early Fusion (EF)

早期融合：首先拼接两个patch作为网络的输入，可以看作15*15*2C的单个patch。然后由一系列的7个卷积层和2个全连接层处理，其中最后一层是一个有两个输出（change和no change）的softmax层。

Siamese(Siam) network

孪生网络：由4个具有共享权值的卷积层的两个并行分支并行处理每个patch，将输出连接起来，并使用两个全连接层来获得两个输出值。

全图变化特征

一旦网络经过训练，就可以通过对测试图像的小块进行单独分类来生成全图变化特征。

在图像的每个像素点中提取一个patch的方式在预测阶段处理速度会很慢（因为以每个像素为中心的patch都要预测一次）。

作者提出，使用更大的步幅来提取图像的patch。然后用一个voting system投票系统来预测整张图像的所有像素点的标签。每个 classiﬁed patch 都会根据网络的输出和覆盖其中心像素的2D高斯分布设置权重，对其覆盖的所有像素的标签进行投票。也就是说，一个patch中，离中心像素越近，权重就会越大。

4 结果

处理

数据集分为两组：14张用于训练，10张用于测试。

数据集较小，使用所有可能的翻转、旋转方法进行数据增强。

为解决不同通道上不同分辨率的问题，将分辨率低于10m的通道上采样到10m分辨率，以便所有通道都能与对齐的像素进行拼接。

标记为未变化的像素比标记为变化的像素多很多。所以训练时，对变化类别使用更高的penalization weight。

为评估分类中输入通道数量的影响，比较了四种情况：彩色图像（RGB，3通道），分辨率为10m的层（RGB+红外，4通道），分辨率最高为20m的层（10通道），分辨率最高可达60m的层（13通道）

实验结果

探究了8种不同的CNN（3ch表示3通道）。可看出，

未变化区域精确度EF网络比Siam网络高；变化区域精确度Siam网络随着通道数的增加表现得比EF网络更好。整体上，EF网络比孪生Siam网络表现更好；

一般而言，颜色通道的增加会使分类性能有所提高。

论文中EF network的效果比 Siam network要好，个人感觉Siam network效果应该会更好的，

原因我暂时不明白。还望有大佬看见了指点迷津！！

可能是因为网络架构比较简单，图像中no change区域比change区域要大得多。只是对像素点做二分类，两张图像的patch拼接后，不同通道的像素在卷积过程中相互间干扰没那么大。

图1：3通道下两种方式经过训练的CNN生成的变化图示例：

图2：EF网络，使用不同输入通道数产生的结果图：

5 结论

提出了进行城市变化检测的OSCD数据集；

提出了两种CNN方法来检测该数据集的图像对的变化。

推动了下一步的工作，即尝试使用全卷积的网络为图像中的所有像素自动生成标签，从而减少所提出方法的 patch effect。

将这项工作扩展到处理变化的 semantic labelling 也将是有意义的，这将进一步有助于解释image pairs。

代码

EF Network：

import torch
import torch.nn as nn

# Change detection network models
# Assumes 15x15 patches 
# Rodrigo Caye Daudt
# https://rcdaudt.github.io/

class TwoChNet_15(nn.Module):
    def __init__(self, n_in = 6):
        super(TwoChNet_15, self).__init__()

        self.layer_depth = [n_in, 32, 32, 64, 64, 128, 128, 128, 8, 2]

        self.cnn = nn.Sequential(# n = 15
            nn.Conv2d(self.layer_depth[0], self.layer_depth[1], kernel_size=3), # n=13
            nn.BatchNorm2d(self.layer_depth[1]), # n=13
            nn.ReLU(), # n=13
            nn.Dropout2d(p=0.2), # n=13
            nn.Conv2d(self.layer_depth[1], self.layer_depth[2], kernel_size=3), # n=11
            nn.BatchNorm2d(self.layer_depth[2]), # n=11
            nn.ReLU(), # n=11
            nn.Dropout2d(p=0.2), # n=11
            nn.Conv2d(self.layer_depth[2], self.layer_depth[3], kernel_size=3), # n=9
            nn.BatchNorm2d(self.layer_depth[3]), # n=9
            nn.ReLU(), # n=9
            nn.Dropout2d(p=0.2), # n=9
            nn.Conv2d(self.layer_depth[3], self.layer_depth[4], kernel_size=3), # n=7
            nn.BatchNorm2d(self.layer_depth[4]), # n=7
            nn.ReLU(), # n=7
            nn.Dropout2d(p=0.2), # n=7
            nn.Conv2d(self.layer_depth[4], self.layer_depth[5], kernel_size=3), # n=5
            nn.BatchNorm2d(self.layer_depth[5]), # n=5
            nn.ReLU(), # n=5
            nn.Dropout2d(p=0.2), # n=5
            nn.Conv2d(self.layer_depth[5], self.layer_depth[6], kernel_size=3), # n=3
            nn.BatchNorm2d(self.layer_depth[6]), # n=3
            nn.ReLU(), # n=3
            nn.Dropout2d(p=0.2), # n=3
            nn.Conv2d(self.layer_depth[6], self.layer_depth[7], kernel_size=3), # n=1
            nn.ReLU() # n=1
            )

        self.fc = nn.Sequential(
            nn.Linear(self.layer_depth[7], self.layer_depth[8]),
            nn.BatchNorm1d(self.layer_depth[8]), 
            nn.ReLU(),
            nn.Dropout2d(p=0.2), 
            nn.Linear(self.layer_depth[8], self.layer_depth[9]),
            nn.Softmax()	
            )

    def forward(self, x1, x2):
        output = torch.cat((x1, x2), 1)
        output = self.cnn(output)
        output = output.view(output.size(0), -1)
        output = self.fc(output)
        return output

Siam Network：

import torch
import torch.nn as nn

# Change detection network models
# Assumes 15x15 patches 
# Rodrigo Caye Daudt
# https://rcdaudt.github.io/

class SiamNet_15(nn.Module):
    def __init__(self, n_in = 3):
        super(SiamNet_15, self).__init__()

        self.layer_depth = [n_in, 64, 64, 128, 128, 64, 2]

        self.cnn = nn.Sequential(
            nn.Conv2d(self.layer_depth[0], self.layer_depth[1], kernel_size=3), # n=13
            nn.BatchNorm2d(self.layer_depth[1]), # n=13
            nn.ReLU(), # n=13
            nn.Dropout2d(p=0.2), # n=13
            nn.Conv2d(self.layer_depth[1], self.layer_depth[2], kernel_size=3), # n=11
            nn.BatchNorm2d(self.layer_depth[2]), # n=11
            nn.ReLU(), # n=11
            nn.Dropout2d(p=0.2), # n=11
            nn.Conv2d(self.layer_depth[2], self.layer_depth[3], kernel_size=3), # n=9
            nn.BatchNorm2d(self.layer_depth[3]), # n=9
            nn.ReLU(), # n=3
            nn.Dropout2d(p=0.2), # n=9
            nn.Conv2d(self.layer_depth[3], self.layer_depth[4], kernel_size=3), # n=7
            nn.BatchNorm2d(self.layer_depth[4]), # n=7
            nn.ReLU() # n=7
            )

        self.fc = nn.Sequential(
            nn.Linear(2*7*7*self.layer_depth[4], self.layer_depth[5]),
            nn.BatchNorm1d(self.layer_depth[5]), 
            nn.ReLU(),
            nn.Dropout2d(p=0.2), 
            nn.Linear(self.layer_depth[5], self.layer_depth[6]),
            nn.Softmax()	
            )

    def forward(self, x1, x2):
        output = torch.cat((self.cnn(x1), self.cnn(x2)), 1)
        output = output.view(output.size(0), -1)
        output = self.fc(output)
        return output

参考博客

Urban change detection for multispectral earth observation using convolution neural network_likyoo的博客-CSDN博客

【论文笔记】Urban change detection for multispectral earth observation using convolution neural network

论文

摘要