Qwen/Qwen-Image-Bench
Image-Text-to-Text • 27B • Updated • 28.2k • 64
None defined yet.
Native Active Perception as Reasoning for Omni-Modal Understanding
Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification