view article Article Ultra-Long Sequence Parallelism: Ulysses + Ring-Attention Technical Principles and Implementation Sep 16 • 17
👩💻 OlympicCoder Collection Reasoning datasets and models for competitive coding • 6 items • Updated 21 days ago • 20
view post Post 1992 Mistral's new SOTA coding models Devstral 2 can now be Run locally! (25GB RAM) 🐱We fixed the chat template, so performance should be much better now!24B: unsloth/Devstral-Small-2-24B-Instruct-2512-GGUF123B: unsloth/Devstral-2-123B-Instruct-2512-GGUF🧡Step-by-step Guide: https://docs.unsloth.ai/models/devstral-2 See translation 🔥 8 8 🚀 5 5 ❤️ 3 3 🤗 2 2 + Reply
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground Paper • 2512.10430 • Published 17 days ago • 112
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 28 days ago • 256
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published Apr 13 • 34
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published Apr 13 • 34 • 2
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders Paper • 2503.03601 • Published Mar 5 • 232