Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
🔄
In a Training Loop
102764.1
TFLOPS
1259
254
889
Lewis Tunstall
PRO
lewtun
Follow
VaishakhRaveendran's profile picture
fuad021's profile picture
HuSusu's profile picture
1,450 followers
·
137 following
https://lewtun.github.io/blog/
_lewtun
lewtun
AI & ML interests
LLMs, LLMs, LLMs
Recent Activity
published
a bucket
about 24 hours ago
lewtun/trl-internal-testing
updated
a bucket
1 day ago
lewtun/trl-internal-testing
updated
a Space
1 day ago
lewtun/sft-static-9f352b
View all activity
Organizations
lewtun
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
attention-wiki/knowledge-base
4 days ago
Process arXiv:2310.01889 - Ring Attention
4
#19 opened 5 days ago by
lewtun
Process arXiv:2309.17453 - StreamingLLM
2
#10 opened 5 days ago by
lewtun
Add source: Retrieval Head Mechanistically Explains Long-Context Factuality (arxiv:2404.15574)
1
#31 opened 5 days ago by
lvwerra
Add source: H2O — Heavy-Hitter KV-cache eviction (arxiv:2306.14048)
2
#29 opened 5 days ago by
lvwerra
Add source: NoPE — positional encoding & length generalization (arxiv:2305.19466)
2
#33 opened 5 days ago by
lvwerra
Add sources: the 'attention as explanation' debate — Jain&Wallace + Wiegreffe&Pinter
2
#32 opened 5 days ago by
lvwerra
Add source: In-context Learning and Induction Heads (arxiv:2209.11895)
2
#30 opened 5 days ago by
lvwerra
Add sources: T5, DeBERTa, TUPE — relative & disentangled positional encoding
2
#26 opened 5 days ago by
lvwerra
Add source: GQA — Grouped-Query Attention (arxiv:2305.13245)
3
#21 opened 5 days ago by
lvwerra
Add source: Shaw et al. — Self-Attention with Relative Position Representations
2
#20 opened 5 days ago by
lvwerra
New activity in
attention-wiki/knowledge-base
5 days ago
Process arXiv:2309.06180 - PagedAttention
2
#9 opened 5 days ago by
lewtun
Process arXiv:2309.00071 - YaRN
2
#8 opened 5 days ago by
lewtun
Process arXiv:2307.03172 - Lost in the Middle
2
#7 opened 5 days ago by
lewtun
Process arXiv:2306.15595 - Position Interpolation
2
#6 opened 5 days ago by
lewtun
Process arXiv:2108.12409 - ALiBi
2
#5 opened 5 days ago by
lewtun
Process arXiv:1911.02150 - Multi-query attention
2
#4 opened 5 days ago by
lewtun
Process arXiv:1901.02860 - Transformer-XL
2
#3 opened 5 days ago by
lewtun
Process arXiv:2104.09864 - RoFormer/RoPE
3
#2 opened 5 days ago by
lewtun
New activity in
smolagents/ml-intern
17 days ago
Billing Issue: Pro Membership not recognized for HF Jobs
2
#34 opened about 2 months ago by
rajvivan
ml-intern-models page is 404
1
#47 opened 21 days ago by
rogermt
Load more