Castle in the Sky

Dynamic Sky Replacement and Harmonization in Videos

Zhengxia Zou

University of Michigan, Ann Arbor

  [Preprint]       [Code]       [Colab]

We propose a vision-based method for video sky replacement and harmonization, which can automatically generate realistic and dramatic sky backgrounds in videos with controllable styles. Different from previous sky editing methods that either focus on static photos or require inertial measurement units integrated in smartphones on shooting videos, our method is purely vision-based, without any requirements on the capturing devices, and can be well applied to either online or offline processing scenarios. Our method runs in real-time and is free of user interactions. We decompose this artistic creation process into a couple of proxy tasks including sky matting, motion estimation, and image blending. Experiments are conducted on videos diversely captured in the wild by handheld smartphones and dash cameras, and show high fidelity and good generalization of our method in both visual quality and lighting/motion dynamics.

Our method produces vivid blending results with a high degree of realism and visual dynamics. With a single NVIDIA Titan XP GPU card, our method reaches a real-time processing speed (24 fps) at the output resolution of 640 x 320 and a near real-time processing speed (15 fps) at 854 x 480. The following gives several groups of our blending results on outdoor videos (floating castle, fire cloud, super moon, and galaxy night).

As a by-product, our method can be also used for image weather and lighting translation. A potential application of our method is data augmentation. Domain gap between datasets with limited samples and the complex real-world poses great challenges for data-driven computer vision methods. For example, domain sensitive visual perception models in self-driving may face problems at night or rainy days due to the limited examples in training data. We believe our method has great potential for improving the generalization ability of deep learning models in a variety of computer vision tasks such as detection, segmentation, tracking, etc. This is one of our future work.


Cloudy to sunny

Sunny to rainy

Cloudy to thunderstorm

    title={Castle in the Sky: Dynamic Sky Replacement and Harmonization in Videos},
    author={Zhengxia Zou},
    journal={arXiv preprint arXiv:2010.11800},