R2D2:密集点的匹配

sdf

构造一个图像对,然后每个图像都依次输入该网络中,得到每张图像的每个像素的descriptors和reliability和repeatability,然后计算损失。

代码:

python train.py --save-path /path/to/model.pt 

输入数据是什么? 

一个字典,包含dict_keys(['img1', 'img2', 'aflow', 'mask'])

img1和img2的shape:torch.Size([3, 3, 192, 192])

aflow的shape:torch.Size([3, 2, 192, 192])   aflow是通过通过homography计算的,第一张图的对应点在第二张图上的坐标

mask的shape:torch.Size([3, 192, 192])

输入数据的计算:datasets/pair_dataset.py

class SyntheticPairDataset (PairDataset):
original_img = self.dataset.get_image(i)#获得输入图像,作为img1
#对该输入图像做变换,获得disorted后的输入图像
scaled_and_distorted_image = self.distort( dict(img=scaled_image2, persp=(1,0,0,0,1,0,0,0))) 作为img2
#根据warp的homography,获得点与点之间的对应关系,保存至aflow中(img1在img2中的对应位置)
trf = scaled_and_distorted_image['persp']#一个homography矩阵
xy = np.mgrid[0:H,0:W][::-1].reshape(2,H*W).T
aflow = np.float32(persp_apply(trf, xy).reshape(H,W,2))
meta['aflow'] = aflow


网络结构:

output = self.net(imgs=[inputs.pop('img1'),inputs.pop('img2')])

p self.net
Quad_L2Net_ConfCFS(
  (ops): ModuleList(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (2): ReLU(inplace)
    (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (5): ReLU(inplace)
    (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (8): ReLU(inplace)
    (9): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (11): ReLU(inplace)
    (12): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
    (13): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (14): ReLU(inplace)
    (15): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(4, 4), dilation=(4, 4))
    (16): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (17): ReLU(inplace)
    (18): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(2, 2), dilation=(4, 4))
    (19): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (20): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(4, 4), dilation=(8, 8))
    (21): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=True)
    (22): Conv2d(128, 128, kernel_size=(2, 2), stride=(1, 1), padding=(8, 8), dilation=(16, 16))
  )
  (clf): Conv2d(128, 2, kernel_size=(1, 1), stride=(1, 1))
  (sal): Conv2d(128, 1, kernel_size=(1, 1), stride=(1, 1))
)

在网络结构中的操作:先卷积操作,然后cls和sal操作
for op in self.ops:
    x = op(x)
# compute the confidence maps
ureliability = self.clf(x**2)
urepeatability = self.sal(x**2)
return self.normalize(x, ureliability, urepeatability)

返回结果:dict_keys(['descriptors', 'repeatability', 'reliability', 'imgs'])

其中'descriptors'是两个(一对)图片的描述符,每个图片描述符的shape为(128,192,192)

repeatability是两个图片可重复性的度量,每个图片对应的shape为(1,192,192)

'reliability'也是一样的,两个图片可靠性的度量,每个图片对应的shape为(1,192,192)

'imgs'应该还是指的原来的两幅输入图像,每幅图像的大小为(3,192,192)

损失函数:

loss, details = self.loss_func(**allvars)

其中loss_func的输入为:dict_keys(['aflow', 'mask', 'descriptors', 'repeatability', 'reliability', 'imgs'])

loss的定义为:

MultiLoss(
  (losses): ModuleList(
    (0): ReliabilityLoss(
      (aploss): APLoss(
        (quantizer): Conv1d(1, 40, kernel_size=(1,), stride=(1,))
      )
      (sampler): NghSampler2()
    )
    (1): CosimLoss(
      (patches): Unfold(kernel_size=16, dilation=1, padding=0, stride=8)
    )
    (2): PeakyLoss(
      (preproc): AvgPool2d(kernel_size=3, stride=1, padding=1)
      (maxpool): MaxPool2d(kernel_size=17, stride=1, padding=8, dilation=1, ceil_mode=False)
      (avgpool): AvgPool2d(kernel_size=17, stride=1, padding=8)
    )
  )
)

reliablityLoss:

return 1 - ap*rel - (1-rel)*self.base

其中ap的计算:

计算每个点的ap

(1)将一个点和它在另一幅图中对应点周边3个像素内的点记为正样本,其他记为负样本。对正负样本采样(样本是指一对点是否为对应点),获得gt(1表示这对点为对应点,0表示这对点不是对应点)

(2)通过一对样本点的2个descriptor,计算得到一个分数。

(3)设置不同的分数阈值,计算正样本的pre和recall,然后计算这些样本点的ap

Ri,j是我们预测得到的点(i,j)的可靠性,如果该点越可靠,Ri,j越大,点的AP越重要,表明该点对损失的作用越大。如果该点不重要,我们希望Ri,j比较小,这点的AP就不重要

repeatability:

def forward(self, repeatability, aflow, **kw):
    B,two,H,W = aflow.shape
    assert two == 2

    # normalize
    sali1, sali2 = repeatability
    grid = FullSampler._aflow_to_grid(aflow)
    sali2 = F.grid_sample(sali2, grid, mode='bilinear', padding_mode='border')

    patches1 = self.extract_patches(sali1)
    patches2 = self.extract_patches(sali2)
    cosim = (patches1 * patches2).sum(dim=2)
    return 1 - cosim.mean()

推理:

发布了90 篇原创文章 · 获赞 13 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/qq_32425195/article/details/104811562