Free3D: Consistent Novel View Synthesis without 3D Representation

Abstract

We introduce Free3D, a simple approach designed for open-set novel view synthesis (NVS) from a single image.

Similar to Zero-1-to-3, we start from a pre-trained 2D image generator for generalization, and fine-tune it for NVS. Compared to recent and concurrent works, we obtain significant improvements without resorting to an explicit 3D representation, which is slow and memory-consuming.

We do so by encoding better the target camera pose via a new per-pixel ray conditioning normalization (RCN) layer. The latter injects camera pose information in the underlying 2D image generator by telling each pixel its specific viewing direction. We also improve multi-view consistency via a light-weight multi-view attention layer and multi-view noise sharing. We train Free3D on the Objaverse dataset and demonstrate excellent generalization to various new categories in several large new datasets, including OminiObject3D and Google Scanned Object (GSO).

Framework

The overall pipeline of our Free3D. (a) Given a single source input image, the proposed architecture jointly predicts multiple target views, instead of processing them independently. To achieve a consistent novel view synthesis without the need for 3D representation, (b) we first propose a novel ray conditional normalization (RCN) layer, which uses a per-pixel oriented camera ray to module the latent features, enabling the model’s ability to capture more precise viewpoints. (c) A memory-friendly pseudo-3D cross-attention module is introduced to efficiently bridge information across multiple generated views. Note that, here the similarity score is only calculated across multiple views in temporal instead of spatial, resulting in a minimal computational and memory cost.

Video

Results

NVS for given camera viewpoint

Free3D significantly improves the accuracy of the generated pose compared to existing state-of-the-art methods on various datasets, including Objaverse (Top two), OminiObject3D (Middle two) and GSO (Bottom two).

360-degree rendering for circle path

Using Free3D, you can directly render a consistent 360-degree video wihout the need of an additional explicit 3D representation or network.

Videos on Objaverse Dataset

Videos on OminiObject3D Dataset

Videos on GSO Dataset

Acknowledgements

Many thanks to Stanislaw Szymanowicz, Edgar Sucar, and Luke Melas-Kyriazi of VGG for insightful discussions and Ruining Li, Eldar Insafutdinov, and Yash Bhalgat of VGG for their helpful feedback. We would also like to thank the authors of Zero-1-to-3 and Objaverse-XL for their helpful discussions.

Citation

@article{zheng2023free3D,
      author    = {Zheng, Chuanxia and Vedaldi, Andrea},
      title     = {Free3D: Consistent Novel View Synthesis without 3D Representation},
      journal   = {arXiv},
      year      = {2023},
    }