PoseProbe

Generic Objects as Pose Probes for Few-Shot View Synthesis

Zhirui Gao Renjiao Yi Chenyang Zhu Ke Zhuang Wei Cheng Kai Xu^*

National University of Defense Technology

Arxiv Code Video

PoseProbe enables realistic view synthesis from few input images without pose prior.

Abstract

Radiance fields including NeRFs and 3D Gaussians demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as inputs. COLMAP is frequently employed for preprocessing to estimate poses, while it necessitates a large number of feature matches to operate effectively, and it struggles with scenes characterized by sparse features, large baselines between images, or a limited number of input images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images. Traditional methods often use calibration boards but they are not common in images. We propose a novel idea of utilizing everyday objects, commonly found in both images and real life, as “pose probes”. The probe object is automatically segmented by SAM, whose shape is initialized from a cube. We apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. Specifically, object poses of two views are first estimated by PnP matching in an SDF representation, which serves as initial poses. PnP matching, requiring only a few features, is suitable for feature-sparse scenes. Additional views are incrementally incorporated to refine poses from preceding views. In experiments, PoseProbe achieves state-of-theart performance in both pose estimation and novel view synthesis across multiple datasets. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance.

Method Overview

We leverage generic objects in few-view input images as pose probes. The pose probe is automatically segmented by SAM with prompts, and initialized by a cube shape. The method does not introduce extra burden but successfully facilitates pose estimation in feature-sparse scenes.

Results

View Synthesis on our synthesis dataset from 3 Input Views

This dataset is generated using BlenderProc with wide-baseline views. We only have access to 3 input views, without camera pose prior. Our method renders clearer details and fewer artifacts compared to other pose-free baselines! Note that the camera poses derived via PnP in our method serve as the initial poses for all NeRF baseline for a fair comparison.

View Synthesis on DTU from 3 Input Views with Noisy Camera Poses

DTU is composed of complex object-centric scenes, with wide-baseline views spanning a half-hemisphere. We only have access to 3 input views, without camera pose prior. As before, All baselines suffer from blurriness and inaccurate scene geometry, while our approach produces much better-quality novel-view renderings thanks to the pose probe constraint. Note that the camera poses derived via PnP in our method serve as the initial poses for all NeRF baseline for a fair comparison.

Citation

If you want to cite our work, please use:

          @article{gao2024generic,
            title={Generic Objects as Pose Probes for Few-Shot View Synthesis},
            author={Gao, Zhirui and Yi, Renjiao and Zhu, Chenyang and Zhuang, Ke and Chen, Wei and Xu, Kai},
            journal={arXiv preprint arXiv:2408.16690},
            year={2024}
          }