The list of hands-on notebooks (some beginner-friendly!) to get started with fine-tuning using TRL keeps growing!!
• SFT • GRPO • Tool calling & agents • RL environments with OpenEnv • LLMs and VLMs ✨ Many run on FREE Colab, making it super easy to get started fast!
The Christmas holidays are here! 🎄 Thinking about learning something new in AI?
@huggingface offers 12 FREE courses covering all the relevant topics, for every level of experience. A great challenge for the holidays (and worth saving for later 🙄)
Muon has gone from an experiment to a mainstream optimizer, but does it hold up for fine‑tuning? We ran head‑to‑head tests on Qwen3‑4B (10k+ high‑quality instruction rows) to find out.
Short story: Pure Muon converged fastest at the start, but its gradient‑norm spikes made training unstable. MuonClip (Kimi K2’s clipping) stabilizes long pretraining runs, yet in our small‑scale fine‑tune it underperformed, lower token accuracy and slower convergence. The winner was the hybrid: Muon for 2D layers + AdamW for 1D layers. It delivered the best balance of stability and final performance and even beat vanilla AdamW.
Takeaway: for small-scale fine-tuning, hybrid = practical and reliable.
Next Step: scale to larger models/datasets to see if Muon’s spikes become catastrophic or if clipping wins out.
ICYMI, you can fine-tune open LLMs using Claude Code
just tell it: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots”
and Claude submits a real training job on HF GPUs using TRL.
it handles everything: > dataset validation > GPU selection > training + Trackio monitoring > job submission + cost estimation when it’s done, your model is on the Hub, ready to use
It comes packed with updates: > Agent training with tools in GRPO > New CISPO & SAPO losses + reasoning rewards > vLLM quantization in colocate mode > Dataset shuffling in SFT > Lots of NEW examples > Tons of fixes and documentation improvements
The LLM by @karpathy is officially in the library, and we wrote a blog covering: how did we port the model, differences from the original, and how to run or train it.