Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding Paper • 2512.17532 • Published 8 days ago • 62
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published 18 days ago • 125
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published 18 days ago • 125
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance Paper • 2512.08765 • Published 18 days ago • 125 • 5
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published Oct 27 • 177
AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes Paper • 2510.10670 • Published Oct 12 • 18
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer Paper • 2509.24695 • Published Sep 29 • 44
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published Sep 9 • 58
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation Paper • 2507.18537 • Published Jul 24 • 17
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 77
DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation Paper • 2307.01831 • Published Jul 4, 2023 • 8
Mask-Attention-Free Transformer for 3D Instance Segmentation Paper • 2309.01692 • Published Sep 4, 2023 • 1
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 47