论文笔记之视频：Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Accurate estimate of optical flow

This can be attributed to the large degree of ambiguities inherent to this ill-posed problem which can only be resolved using prior knowledge about the appearance and motion of image sequences.
Large datasets and obtaining ground truth for real images is challenging as labeling dense correspondences by hand is intractable
Creating data from a distribution that resembles natural scenes is a hard problem on its own.

The author’s solution

We estimate optical flow in both past and future direction together with an occlusion map within a temporal window of three frames.
Unsupervised loss evaluates the warped images from the past and the future based on the estimated flow fields and occlusion map.
In addition to typical spatial smoothness constraints, we introduce a constant velocity constraint within the temporal window.

Allows to reason about occlusions in a principled manner while leveraging temporal information for more accurate optical flow prediction in occluded regions.

Perform ablation study
5
Summarize:

We propose a novel unsupervised, multi-frame optical flow formulation which estimates past and future flow within a three-frame temporal window.
By explicitly reasoning about occlusions, we increase the fidelity of the photometric loss, resulting in sharper boundaries (Fig. 1(h)) in comparison to two-frame as well as three-frame formulations without occlusions.
We demonstrate that temporal constraints enable more accurate optical flow predictions in occluded regions compared to just spatial propagation, as in all existing unsupervised two-frame optical flow formulations.

Method

Notation

$L = {I_P,I_R,I_F}$
three consecutive RGB $I_t \in R^{(W*H*3)}$ frames
$(U_F \in R^{(W*H*2)}$
from reference frame $I_R$ to future frame $I_F$ while leveraging the past frame $I_P$

In this short temporal window, we assume the motion to be approximately linear.

The simplest way to enforce a linear motion is using a hard constraint by predicting only one flow field and warping both images $I_P$ , $I_F$ to the reference image $I_R$ according to this flow field for computing the photometric loss.(仅计算一个流场，并根据流场将P，F翘曲到R，去计算光度损失)
Realistic scenes usually contain more complex motions which violate this hard constraint.

Formulate a soft constraint by predicting two optical flow fields and encouraging constant velocity

$U_F$ flow field form reference frame to future frame

$U_P$ flow field from reference frame to past frame $I_P$

Occlusion variable $\in [0,1]^{W*H*2}$

allows to correctly evaluate the photometric loss by reducing the importance of occluded pixels

$|O§||_1 = 1$ denote the occlusion at pixel

IF $O (P) = (1, 0)$ , backward occluded, occluded in the previous.(0,1) forward occluded. (0.5,0.5) visible

Propose to estimate $U_F$ , $U_P$ and O jointly using a neural network and enforcing $||O§||_1 = 1$ with a softmax at the last layer of the network

Networ Architecture

PWC-Net architecture
a coarse-to-fine manner
three separate decoders for future flow, past flow and occlusion map

Loss function

$L = L_P + L_{S_F}+ L_{S_P}+ L_{S_O} + L_{C_V} + l_O$