Accurate estimate of optical flow
- This can be attributed to the large degree of ambiguities inherent to this ill-posed problem which can only be resolved using prior knowledge about the appearance and motion of image sequences.
- Large datasets and obtaining ground truth for real images is challenging as labeling dense correspondences by hand is intractable
- Creating data from a distribution that resembles natural scenes is a hard problem on its own.
The author’s solution
- We estimate optical flow in both past and future direction together with an occlusion map within a temporal window of three frames.
- Unsupervised loss evaluates the warped images from the past and the future based on the estimated flow fields and occlusion map.
- In addition to typical spatial smoothness constraints, we introduce a constant velocity constraint within the temporal window.
Allows to reason about occlusions in a principled manner while leveraging temporal information for more accurate optical flow prediction in occluded regions.
- Perform ablation study
5
Summarize:
- We propose a novel unsupervised, multi-frame optical flow formulation which estimates past and future flow within a three-frame temporal window.
- By explicitly reasoning about occlusions, we increase the fidelity of the photometric loss, resulting in sharper boundaries (Fig. 1(h)) in comparison to two-frame as well as three-frame formulations without occlusions.
- We demonstrate that temporal constraints enable more accurate optical flow predictions in occluded regions compared to just spatial propagation, as in all existing unsupervised two-frame optical flow formulations.
Method
Notation
L = I P , I R , I F L = {I_P,I_R,I_F} L=IP,IR,IF
three consecutive RGB I t ∈ R ( W ∗ H ∗ 3 ) I_t \in R^{(W*H*3)} It∈R(W∗H∗3)frames
( U F ∈ R ( W ∗ H ∗ 2 ) (U_F \in R^{(W*H*2)} (UF∈R(W∗H∗2)
from reference frame I R I_R IR to future frame I F I_F IF while leveraging the past frame I P I_P IP
In this short temporal window, we assume the motion to be approximately linear.
The simplest way to enforce a linear motion is using a hard constraint by predicting only one flow field and warping both images I P I_P IP , I F I_F IF to the reference image I R I_R IR according to this flow field for computing the photometric loss.(仅计算一个流场,并根据流场将P,F翘曲到R,去计算光度损失)
Realistic scenes usually contain more complex motions which violate this hard constraint.
Formulate a soft constraint by predicting two optical flow fields and encouraging constant velocity
U F U_F UF flow field form reference frame to future frame
U P U_P UP flow field from reference frame to past frame \(I_P\)
Occlusion variable O ∈ [ 0 , 1 ] W ∗ H ∗ 2 O \in [0,1]^{W*H*2} O∈[0,1]W∗H∗2
allows to correctly evaluate the photometric loss by reducing the importance of occluded pixels
$|O§||_1 = 1$ denote the occlusion at pixel
IF O ( P ) = ( 1 , 0 ) O(P) = (1,0) O(P)=(1,0), backward occluded, occluded in the previous.(0,1) forward occluded. (0.5,0.5) visible
Propose to estimate \(U_F\) , \(U_P\) and O jointly using a neural network and enforcing \(||O§||_1 = 1\) with a softmax at the last layer of the network
Networ Architecture
PWC-Net architecture
a coarse-to-fine manner
three separate decoders for future flow, past flow and occlusion map
Loss function
L = L P + L S F + L S P + L S O + L C V + l O L = L_P + L_{S_F}+ L_{S_P}+ L_{S_O} + L_{C_V} + l_O L=LP+LSF+LSP+LSO+LCV+lO