Gen3R: 3D Scene Generation Meets Feed-Forward Reconstruction
Abstract
Gen3R combines foundational reconstruction models with video diffusion models to generate 3D scenes with RGB videos and geometric information through aligned latents.
We present Gen3R, a method that bridges the strong priors of foundational reconstruction models and video diffusion models for scene-level 3D generation. We repurpose the VGGT reconstruction model to produce geometric latents by training an adapter on its tokens, which are regularized to align with the appearance latents of pre-trained video diffusion models. By jointly generating these disentangled yet aligned latents, Gen3R produces both RGB videos and corresponding 3D geometry, including camera poses, depth maps, and global point clouds. Experiments demonstrate that our approach achieves state-of-the-art results in single- and multi-image conditioned 3D scene generation. Additionally, our method can enhance the robustness of reconstruction by leveraging generative priors, demonstrating the mutual benefit of tightly coupling reconstruction and generative models.
Community
We Introduce Gen3R โ create multi-quantity geometry with RGB from images.
๐ท Photorealistic Video
๐ Accurate 3D Scene Geometry
Arxiv: https://arxiv.org/abs/2601.04090
Project page: https://xdimlab.github.io/Gen3R/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GeoVideo: Introducing Geometric Regularization into Video Generation Model (2025)
- Splatent: Splatting Diffusion Latents for Novel View Synthesis (2025)
- WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling (2025)
- Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement (2025)
- GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation (2025)
- CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model. (2025)
- LumiTex: Towards High-Fidelity PBR Texture Generation with Illumination Context (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper