NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

Abstract

Recent implicit neural representations have shown great results for novel-view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views.

To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields for few-view view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360° scenes from a single or a few posed RGB images. The essence of our approach is in capturing the distribution of complex real-world outdoor 3D scenes and using a hybrid image- conditional tri-planar representation that can be queried from any world point. Our representation combines the best of both voxel-based and bird’s-eye-view (BEV) representations and is more effective and expressive than each. NeO 360’s representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference.

We demonstrate our approach on our proposed challenging 360° unbounded dataset, called NeRDS360 and show that NeRO outperforms state-of-the art generalizable methods for novel-viewsynthesis while also offering editing and composition capabilities.

NERDS 360 Multi-View dataset for Outdoor Scenes

NeRDS 360: "NeRF for Reconstruction, Decomposition and Scene Synthesis of 360° outdoor scenes” dataset comprising 75 unbounded scenes with full multi-view annotations and diverse scenes for generalizable NeRF training and evaluation.

Method

Overview: a) Given just a single or a few input images from a novel scene, our method reconstructs and renders new 360° views of complex unbounded outdoor scenes b) We achieve this by constructing an image-conditional triplane representation to model the 3D surrounding from various perspectives. c) Our method generalizes across novel scenes and viewpoints for complex 360° outdoor scenes.

Architecture

NeO 360 Architecture: Our method effectively uses local features to infer an image-conditional triplanar representation for both backgrounds and foregrounds. These triplanar features are obtained after orthogonally projecting positions (x) into each plane and bilinearly interpolating feature vectors. Dedicated NeRF decoder MLPs are used to regress density and color each for foreground and backgrounds

Qualitative Novel-View Synthesis Results

Our method excels in novel-view synthesis from 3 source views, outperforming strong generalizable NeRF baselines. Vanilla NeRF struggles due to overfitting on these 3 views. MVSNeRF, although generalizable, is limited to nearby views as stated in the original paper, and thus struggles with distant views in this more challenging task whereas PixelNeRF’s renderings also produce artifacts for far backgrounds.

Qualitative Scene Decompisition

NeO 360 scene decomposition qualitative results: showing 3-view scene decomposed individual objects along with novel views on the NeRDS360 dataset. Our approach performs accurate decomposition by sampling inside the 3D bounding boxes of the objects; hence giving full control over object editability from very few input views

Overfitting Experiments

While over proposed technique is a generalizable method which works in a few-shot setting, for the ease of reproducibility and to push the state-of-the-art on single scene novel-view-synthesis of unbounded scenes, we provide scripts to overfit to single scenes given many images. For more details, please see our released pytorch code.

BibTeX

If you find our paper, NERDS 360 dataset or pytorch code repository useful, please consider citing:

@inproceedings{irshad2023neo360, title={NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes}, author={Muhammad Zubair Irshad and Sergey Zakharov and Katherine Liu and Vitor Guizilini and Thomas Kollar and Adrien Gaidon and Zsolt Kira and Rares Ambrus}, journal={Interntaional Conference on Computer Vision (ICCV)}, year={2023} }