🏗️ Building on HF

Mr Munk

GODELEV

AI & ML interests

High schooler by day, LLM builder by night. Driven by a deep love for both Physics and AI. Currently spending my runtime building on Hugging Face, experimenting with transformer architectures, and training custom LLMs.

Recent Activity

reacted to Banaxi-Tech's post with 👀 about 17 hours ago

Today we are releasing BananaMind-KV1-8M-2Bit-Experimental, a KV-cache-aware trained model that stores its generation KV cache in 2-bit precision instead of the usual 16-bit precision. Result: 5.33x smaller KV cache vs FP16, with 0.0916 mean KLD against a 16-bit KV cache reference on WikiText-2. Model: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental The important part: this is not just post-training KV cache quantization. Instead we take the BitNet approach. KV1 is trained with a 2-bit-aware K/V path. Instead of training a normal model and quantizing the cache afterwards, the model learns during training to operate under the low-bit KV constraint, closer in spirit to the BitNet idea of training for the low-bit regime. During generation, each K/V vector is quantized into 4 affine levels and packed into uint8 tensors, with four 2-bit values stored per byte. WikiText-2 eval vs 16-bit KV cache reference: Mean KLD: 0.0916 nats/token Mean KLD: 0.1322 bits/token Average KV cache shrink vs FP16: 5.33x Evaluated positions: 372,675 If this actually gets used in models like Qwen or Gemma, then it may be possible to run 128K or even 256K Context on a Normal Machine! Try it here: https://huggingface.co/BananaMind/BananaMind-KV1-8M-2Bit-Experimental Code: https://github.com/Banaxi-Tech/kv1

reacted to Quazim0t0's post with 👀 about 20 hours ago

Created research language model whose channel-mixing block is not an MLP. It is a differentiable Neighbour-Sensing fungal-colony-growth model: each token is expanded into a colony of hyphal tips that grow in a bounded latent region, sense a shared density field, and steer their own growth — the "MLP" is replaced by a few differentiable steps of colony growth, read back out into the hidden state. https://huggingface.co/Quazim0t0/Mycel-LM-79M Also the original SpikeWhale project — the one that sparked all the other SpikeWhale related projects. Every spiking primitive here is hand-written in plain PyTorch: the leaky integrate-and-fire (LIF) neuron dynamics, the fast-sigmoid surrogate gradient, and the backprop-through-time training loop. No snntorch, no spikingjelly, no norse, no bindsnet — the network is a genuine from-scratch SNN. https://huggingface.co/Quazim0t0/SpikeWhale-SNN-216M

liked a Space 2 days ago

StentorLabs/SLM_Arena

View all activity

Organizations

None yet

buckets 1

GODELEV/PretrainingDatasets

0 Bytes

models 12

datasets 20

GODELEV/Arithmetic-XL

Viewer • Updated 5 days ago • 24M • 71 • 1

GODELEV/Arithmetic-Large

Preview • Updated 8 days ago • 44

GODELEV/D1-8

Viewer • Updated 20 days ago • 9.37M • 142

GODELEV/D1-8-Lite

Viewer • Updated 21 days ago • 6.3M • 90

GODELEV/D1-32

Viewer • Updated 22 days ago • 11.1M • 181

GODELEV/D1-32-Lite

Viewer • Updated 22 days ago • 5.66M • 74

GODELEV/Ant-5M-V2-TE

Viewer • Updated 26 days ago • 2.5M • 68

GODELEV/Ant-5M-V2-T

Viewer • Updated 27 days ago • 3.01M • 103

GODELEV/12345

Viewer • Updated 28 days ago • 777k • 31

GODELEV/Arithmetic-1.5M

Viewer • Updated 29 days ago • 1.5M • 80

View 20 datasets