kak's picture

kak

Kaowai

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

upvoted a paper 6 days ago

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

upvoted a paper 6 days ago

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

View all activity

Organizations

None yet

upvoted 3 papers 6 days ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Paper • 2606.02482 • Published 8 days ago • 35

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Paper • 2605.30611 • Published 12 days ago • 192

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Paper • 2605.28132 • Published 13 days ago • 25

upvoted a paper 17 days ago

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Paper • 2605.22536 • Published 19 days ago • 28

upvoted a paper 18 days ago

WorldKV: Efficient World Memory with World Retrieval and Compression

Paper • 2605.22718 • Published 19 days ago • 41

upvoted 5 papers 20 days ago

Stop When Reasoning Converges: Semantic-Preserving Early Exit for Reasoning Models

Paper • 2605.17672 • Published 23 days ago • 22

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

Paper • 2605.14038 • Published 27 days ago • 15

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Paper • 2605.18678 • Published 22 days ago • 78

KVPO: ODE-Native GRPO for Autoregressive Video Alignment via KV Semantic Exploration

Paper • 2605.14278 • Published 26 days ago • 37

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published 22 days ago • 112

upvoted 4 papers 25 days ago

Qwen-Image-VAE-2.0 Technical Report

Paper • 2605.13565 • Published 27 days ago • 60

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

Paper • 2605.13062 • Published 27 days ago • 33

MinT: Managed Infrastructure for Training and Serving Millions of LLMs

Paper • 2605.13779 • Published 27 days ago • 219

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Paper • 2605.13724 • Published 27 days ago • 101

liked a model 2 months ago

robbyant/lingbot-world-fast

Image-to-Video • Updated Apr 2 • 7.61k • 16

upvoted 5 papers 3 months ago

Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 155

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Paper • 2603.16669 • Published Mar 17 • 70

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 373

TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas

Paper • 2603.16448 • Published Mar 17 • 58

Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Paper • 2512.05081 • Published Dec 4, 2025 • 33