Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 2 days ago • 52
FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering Paper • 2512.16670 • Published 2 days ago • 3
VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks Paper • 2512.16501 • Published 2 days ago • 8
Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image Paper • 2512.16899 • Published 2 days ago • 9
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text Paper • 2512.16924 • Published 2 days ago • 19
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published 10 days ago • 55
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator 3 days ago • 30
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning Paper • 2512.13874 • Published 5 days ago • 15
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling Paper • 2512.15702 • Published 3 days ago • 11
Is Nano Banana Pro a Low-Level Vision All-Rounder? A Comprehensive Evaluation on 14 Tasks and 40 Datasets Paper • 2512.15110 • Published 4 days ago • 6
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published 3 days ago • 41
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning Paper • 2512.15693 • Published 3 days ago • 16
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives Paper • 2512.14699 • Published 4 days ago • 25
Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models Paper • 2512.14008 • Published 5 days ago • 8