Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 2 days ago • 54
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published 2 days ago • 17
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published 3 days ago • 13
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published 4 days ago • 39
Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure Paper • 2512.14336 • Published 4 days ago • 27
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published 4 days ago • 60
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics Paper • 2512.13660 • Published 5 days ago • 36
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection Paper • 2512.13250 • Published 5 days ago • 9
Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection Paper • 2512.13250 • Published 5 days ago • 9
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published 12 days ago • 100
X-Humanoid: Robotize Human Videos to Generate Humanoid Videos at Scale Paper • 2512.04537 • Published 17 days ago • 6
Evaluating Gemini Robotics Policies in a Veo World Simulator Paper • 2512.10675 • Published 9 days ago • 15
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • 0.9B • Updated Sep 26, 2024 • 6.48M • 632