Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
🏗️
Building on HF
8.4
TFLOPS
Mr Munk
GODELEV
34
2
28
Follow
ThingsAI's profile picture
Jdudeo's profile picture
StentorLabs's profile picture
14 followers
·
21 following
AI & ML interests
High schooler by day, LLM builder by night. Driven by a deep love for both Physics and AI. Currently spending my runtime building on Hugging Face, experimenting with transformer architectures, and training custom LLMs.
Recent Activity
reacted
to
Banaxi-Tech
's
post
with 👀
about 17 hours ago
Today we are releasing BananaMind-KV1-8M-2Bit-Experimental, a KV-cache-aware trained model that stores its generation KV cache in 2-bit precision instead of the usual 16-bit precision. Result: 5.33x smaller KV cache vs FP16, with 0.0916 mean KLD against a 16-bit KV cache reference on WikiText-2. Model: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental The important part: this is not just post-training KV cache quantization. Instead we take the BitNet approach. KV1 is trained with a 2-bit-aware K/V path. Instead of training a normal model and quantizing the cache afterwards, the model learns during training to operate under the low-bit KV constraint, closer in spirit to the BitNet idea of training for the low-bit regime. During generation, each K/V vector is quantized into 4 affine levels and packed into uint8 tensors, with four 2-bit values stored per byte. WikiText-2 eval vs 16-bit KV cache reference: Mean KLD: 0.0916 nats/token Mean KLD: 0.1322 bits/token Average KV cache shrink vs FP16: 5.33x Evaluated positions: 372,675 If this actually gets used in models like Qwen or Gemma, then it may be possible to run 128K or even 256K Context on a Normal Machine! Try it here: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental Code: https://github.com/Banaxi-Tech/kv1
reacted
to
Quazim0t0
's
post
with 👀
about 20 hours ago
Created research language model whose channel-mixing block is not an MLP. It is a differentiable Neighbour-Sensing fungal-colony-growth model: each token is expanded into a colony of hyphal tips that grow in a bounded latent region, sense a shared density field, and steer their own growth — the "MLP" is replaced by a few differentiable steps of colony growth, read back out into the hidden state. https://huggingface.co/Quazim0t0/Mycel-LM-79M Also the original SpikeWhale project — the one that sparked all the other SpikeWhale related projects. Every spiking primitive here is hand-written in plain PyTorch: the leaky integrate-and-fire (LIF) neuron dynamics, the fast-sigmoid surrogate gradient, and the backprop-through-time training loop. No snntorch, no spikingjelly, no norse, no bindsnet — the network is a genuine from-scratch SNN. https://huggingface.co/Quazim0t0/SpikeWhale-SNN-216M
liked
a Space
2 days ago
StentorLabs/SLM_Arena
View all activity
Organizations
None yet
buckets
1
GODELEV/PretrainingDatasets
0 Bytes
models
12
Sort: Recently updated
GODELEV/Archaea-74M-V1.1
Text Generation
•
74M
•
Updated
12 days ago
•
30
•
2
GODELEV/Exp-1
9.85M
•
Updated
15 days ago
•
532
•
1
GODELEV/TOK-16K
Updated
19 days ago
GODELEV/TOK-8K
Updated
21 days ago
GODELEV/TOK-32K
Updated
23 days ago
GODELEV/Ant-10M
Text Generation
•
9.9M
•
Updated
24 days ago
•
141
•
4
GODELEV/Archaea-74M
Text Generation
•
74M
•
Updated
27 days ago
•
307
•
4
GODELEV/TOK-4K
Updated
28 days ago
GODELEV/Ant-5M
Text Generation
•
4.71M
•
Updated
29 days ago
•
682
•
2
GODELEV/Test-1-4000
Text Generation
•
0.2B
•
Updated
May 9
•
101
View 12 models
datasets
20
Sort: Recently updated
GODELEV/Arithmetic-XL
Viewer
•
Updated
5 days ago
•
24M
•
71
•
1
GODELEV/Arithmetic-Large
Preview
•
Updated
8 days ago
•
44
GODELEV/D1-8
Viewer
•
Updated
20 days ago
•
9.37M
•
142
GODELEV/D1-8-Lite
Viewer
•
Updated
21 days ago
•
6.3M
•
90
GODELEV/D1-32
Viewer
•
Updated
22 days ago
•
11.1M
•
181
GODELEV/D1-32-Lite
Viewer
•
Updated
22 days ago
•
5.66M
•
74
GODELEV/Ant-5M-V2-TE
Viewer
•
Updated
26 days ago
•
2.5M
•
68
GODELEV/Ant-5M-V2-T
Viewer
•
Updated
27 days ago
•
3.01M
•
103
GODELEV/12345
Viewer
•
Updated
28 days ago
•
777k
•
31
GODELEV/Arithmetic-1.5M
Viewer
•
Updated
29 days ago
•
1.5M
•
80
View 20 datasets