Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published Oct 6 • 36
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Paper • 2510.03264 • Published Sep 26 • 23