ViewExtrapolator: Novel View Extrapolation
with Video Diffusion Priors
arXiv 2024

Kunhao Liu
Nanyang Technological University
Ling Shao
UCAS-Terminus AI Lab, UCAS
Shijian Lu
Nanyang Technological University

Abstract

The field of novel view synthesis has made significant strides due to the development of radiance field methods. However, most radiance field techniques are far better at novel view interpolation than novel view extrapolation where the synthesis novel views are far beyond the observed training views. We design ViewExtrapolator, a novel view synthesis approach that leverages the generative priors of Stable Video Diffusion (SVD) for realistic novel view extrapolation. By redesigning the SVD denoising process, ViewExtrapolator refines the artifact-prone views rendered by radiance fields, greatly enhancing the clarity and realism of the synthesized novel views. ViewExtrapolator is a generic novel view extrapolator that can work with different types of 3D rendering such as views rendered from point clouds when only a single view or monocular video is available. Additionally, ViewExtrapolator requires no fine-tuning of SVD, making it both data-efficient and computation-efficient. Extensive experiments demonstrate the superiority of ViewExtrapolator in novel view extrapolation.

TL;DR

We introduce ViewExtrapolator, a novel approach that leverages the generative priors of Stable Video Diffusion for novel view extrapolation, where the novel views lie far beyond the range of the training views.


Background

overview

(Left) Radiance fields perform well in novel view interpolation but face significant challenges in novel view extrapolation, where test novel views extend far beyond the range of the training views. In these extrapolation scenarios, radiance field rendering quality deteriorates notably, often introducing substantial artifacts.

(Right) Most existing benchmarks such as LLFF and Mipnerf-360 take an interpolation setting, as test views are situated close to the training views. Thus we propose LLFF-Extra, a new dataset in which test novel views are placed well beyond the range of the training views, offering a more suitable evaluation of novel view extrapolation.


Overview

overview

We render an artifact-prone video from the closest training view to an extrapolative novel view with radiance fields or point clouds. We then refine the rendered artifact-prone video by guiding SVD to preserve the original scene content and eliminate the artifacts with guidance annealing and resampling annealing. Please refer to the paper for more technical details.


Comparisons

Comparisons of ViewExtrapolator and 3D Gaussian Splatting on novel view extrapolation. Please refer to the paper for more qualitative and quantitative comparisons. (Click to play)

3DGS
Ours
3DGS
Ours
3DGS
Ours
3DGS
Ours
3DGS
Ours
3DGS
Ours

Applications

Applications of ViewExtrapolator for novel view extrapolation on single views and monocular videos. (Click to play)

Point cloud
Ours
Point cloud
Ours
Point cloud
Ours
Point cloud
Ours
Point cloud
Ours
Point cloud
Ours

Citation

Consider citing us if you find this project helpful.
@article{liu2024novel,
  title={Novel View Extrapolation with Video Diffusion Priors},
  author={Liu, Kunhao and Shao, Ling and Lu, Shijian},
  journal={arXiv preprint arXiv:2411.14208},
  year={2024}
}

Acknowledgements

Our work is based on Stable Video Diffusion and gsplat implementation of 3D Gaussian Splatting . We thank the authors for their great work and open-sourcing the code. We would also like to express our gratitude to Fangneng for his guidance and discussion.