[to-read]
updated
A Survey of Small Language Models
Paper
•
2410.20011
•
Published
•
46
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
•
2410.23168
•
Published
•
24
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A
Gradient Perspective
Paper
•
2410.23743
•
Published
•
64
GPT or BERT: why not both?
Paper
•
2410.24159
•
Published
•
13
Physics in Next-token Prediction
Paper
•
2411.00660
•
Published
•
14
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper
•
2411.02327
•
Published
•
11
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
127
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
•
2411.13676
•
Published
•
46
Paper
•
2410.21276
•
Published
•
87
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
170