Collections
Discover the best community collections!
Collections including paper arxiv:2604.10098
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 506 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 82 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 110
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 305 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 19 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6 -
Causal Attention with Lookahead Keys
Paper • 2509.07301 • Published • 21
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 181 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 53 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 8 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 161 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89
-
Depth Anything V2
Paper • 2406.09414 • Published • 105 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 52 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 123
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 181 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 53 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Visual Spatial Tuning
Paper • 2511.05491 • Published • 53 -
Adam's Law: Textual Frequency Law on Large Language Models
Paper • 2604.02176 • Published • 506 -
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper • 2604.10098 • Published • 82 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 110
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 8 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 33 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 161 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 89
-
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 305 -
Lizard: An Efficient Linearization Framework for Large Language Models
Paper • 2507.09025 • Published • 19 -
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective
Paper • 2507.23632 • Published • 6 -
Causal Attention with Lookahead Keys
Paper • 2509.07301 • Published • 21
-
Depth Anything V2
Paper • 2406.09414 • Published • 105 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 52 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 123