A simple survey on inpainting

From Wikipedia, inpainting is the process of reconstructing lost or deteriorated parts of images and videos. In my opinion, besides 2D objects, 3D models or images with depth can also be inpainted. In some ways, 2D and 3D inpainting are highly related, for example, GAN, which obtain semantic / perceptual / style information from 2D pixel-based images, can also be used in 3D voxel-based models. In recent years of CVPR / ICCV / SIGGRAPH, there are many papers focus on 2D or 3D inpainting, so in this week, I learn their methods and think what we can do in the future.

State-of-the-art Papers

SIGGRAPH 2017, Globally and Locally Consistent Image Completion [1]

This paper is about image completion and their results look good.

Based on GAN, they make several improvements. First, they introduce dilated convolution layers to replace some of conventional convolution layers, so larger reception field can be obtained without change the dimension of each layer. Second, they extend GAN discriminator to two different discriminators, global context discriminator and local context discriminator. While the global discriminator aims to recognize global consistency, local discriminator only looks at the completed region and judge the quality. Third, after the GAN generate the image, a post-processing is done by blending the completed region with the color of the surrounding pixels (fast marching method, followed by Poisson image blending), thus to fix some subtle color inconsistency.

CVPR 2018, Image Inpainting for Irregular Holes Using Partial Convolutions, [2]

This paper, which is also about image inpainting, is written by Nvidia group. Nvidia group has a very cool video to show its inpainting capability and declare it’s a new way to PS. The video URL is Nvidia image inpainting.

In the introduction part of this paper, [2] points out problems in current inpainting method. Methods without DNN can produce smooth results, but they have no concept of visual semantics (like PatchMatch[3]) . Methods like [1] require expensive post-processing to reduce visual artifacts, e.g. checkerboard artifacts and fish scale artifacts. Other methods focus on rectangle shaped holes, thus cannot have many applications.

This paper proposes the use of partial convolutions with an automatic mask update step, which can not only handle different shape mask, but also remove the post-processing part. Besides, this paper release a irregular mask dataset, which can be used by other researchers.

ICCV 2017, Shape Inpainting using 3D Generative Adversarial Network and Recurrent Convolutional Networks, [4]

This paper applies GAN and LRCN (a type of LSTM) to 3D model inpainting.

This paper use low-resolution corrupted models as input to produce high-resolution complete objects. Since the transfer from low-resolution to high-resolution using GAN directly needs a large amount of parameters, it’s very hard to train. This paper proposes a two-step method, in the first step, GAN is used to fill holes in the corrupted models; in the second step, the authors view 3D models as 2D video, thus using a variation of LSTM (called LRCN) to make the model from low-resolution to high-resolution.

This paper also apply this method to some applications. It can be used in real world scanning, predicting the whole object from part of scanned surfaces. Part of this method also gain the feature of the model, thus can be used as a model classification.

ECCV 2018, Learning Shape Priors for Single-View 3D Completion and Reconstruction, [5]

This paper gives a method to transfer a single view image (a RGB image without depth or a depth image without color information) to a natural 3D object.

In the image above, interpretation A are generative result while interpretation B are ground truth. Because from a single view we can have many different plausible result, this paper proposes a method to measure naturalness, and choose one of the result from all reasonable results. However, this paper only trains on the model of airplane, car and chair, so I am not sure about its extendibility. But the result is quite impressive.


Details

Generative Adversarial Nets in NIPS 2014 [6] first introduces the idea of GAN.

U-Net: Convolutional Networks for Biomedical Image Segmentation in 2015 [7] first introduces U-Net, suggests that convolution layer can not only used to extract features of the image, but also use features to generate images.

Context Encoders: Feature Learning by Inpainting in CVPR 2016 [8] first proposes the encoder & decoder pipeline based on U-Net in inpainting.

Conditional Generative Adversarial Nets in 2014 [9] first introduces conditional GAN, thus the generative result can be based on some prior knowledge.

The network architecture in [2] is based on Image-to-Image Translation with Conditional Adversarial Networks in CVPR 2017 [10], which in based on cGAN.

Other things about convolution filter / transposed convolution filter / dilated convolution filter can refer to A guide to convolution arithmetic for deep learning in 2016 [11].


Other Interesting Papers

ICCV 2017, High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference, [12]

CVPR 2017, Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis, [13]

CVPR 2017, Semantic Scene Completion from a Single Depth Image, [14]

References

[1] Iizuka S, Simo-Serra E, Ishikawa H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics (TOG), 2017, 36(4): 107.
[2] Liu G, Reda F A, Shih K J, et al. Image inpainting for irregular holes using partial convolutions[J]. arXiv preprint arXiv:1804.07723, 2018.
[3] Barnes C, Shechtman E, Finkelstein A, et al. PatchMatch: A randomized correspondence algorithm for structural image editing[J]. ACM Transactions on Graphics (ToG), 2009, 28(3): 24.
[4] Wang W, Huang Q, You S, et al. Shape inpainting using 3d generative adversarial network and recurrent convolutional networks[J]. arXiv preprint arXiv:1711.06375, 2017.
[5] Wu J, Zhang C, Zhang X, et al. Learning Shape Priors for Single-View 3D Completion and Reconstruction[J]. arXiv preprint arXiv:1809.05068, 2018.
[6] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in neural information processing systems. 2014: 2672-2680.
[7] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234-241.
[8] Pathak D, Krahenbuhl P, Donahue J, et al. Context encoders: Feature learning by inpainting[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2536-2544.
[9] Mirza M, Osindero S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014.
[10] Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks[J]. arXiv preprint, 2017.
[11] Dumoulin V, Visin F. A guide to convolution arithmetic for deep learning[J]. arXiv preprint arXiv:1603.07285, 2016.
[12] Han X, Li Z, Huang H, et al. High-Resolution Shape Completion Using Deep Neural Networks for Global Structure and Local Geometry Inference[C]//Proceedings of IEEE International Conference on Computer Vision (ICCV). 2017.
[13] Dai A, Qi C R, Nießner M. Shape completion using 3d-encoder-predictor cnns and shape synthesis[C]//Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017, 3.
[14] Song S, Yu F, Zeng A, et al. Semantic scene completion from a single depth image[C]//Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on. IEEE, 2017: 190-198.

猜你喜欢

转载自blog.csdn.net/yucong96/article/details/83061729