Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 43
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations Paper • 2506.18898 • Published Jun 23, 2025 • 33
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper • 2506.09113 • Published Jun 10, 2025 • 105
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Paper • 2506.09350 • Published Jun 11, 2025 • 48
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset Paper • 2505.09568 • Published May 14, 2025 • 98
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion Paper • 2504.04010 • Published Apr 5, 2025 • 9
X-Dancer: Expressive Music to Human Dance Video Generation Paper • 2502.17414 • Published Feb 24, 2025 • 14
X-Dyna: Expressive Dynamic Human Image Animation Paper • 2501.10021 • Published Jan 17, 2025 • 14