SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28, 2025 • 17
Jointly Reinforcing Diversity and Quality in Language Model Generations Paper • 2509.02534 • Published Sep 2, 2025 • 24
Adaptive Decoding via Latent Preference Optimization Paper • 2411.09661 • Published Nov 14, 2024 • 10