m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers Paper • 2402.16918 • Published Feb 26, 2024
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 16
A Closer Look into Mixture-of-Experts in Large Language Models Paper • 2406.18219 • Published Jun 26, 2024 • 17