Kokoro-82M MLX 4-bit

MLX 4-bit quantization of hexgrad/Kokoro-82M, produced with mlx-audio on Apple Silicon.

Provenance

Converted from mlx-community/Kokoro-82M-bf16, which in turn is a safetensors export of the original kokoro-v1_0.pth. The original weights remain bundled alongside the quantized model.safetensors so callers can fall back to bf16 where audio quality demands it.

Quantization notes

Kokoro is a StyleTTS2-derived architecture with many small LSTMs, istftnet convolution blocks, and normalization layers that are not eligible for mlx quantization at default thresholds. Reported "bits per weight" after conversion: ~27.6 — most parameters remained bf16. Only a subset of large linear projections were actually 4-bit quantized.

Practical takeaway:

Savings vs bf16: ~13% on disk (270 MB vs 312 MB for the quantized weights file).
Audio quality: indistinguishable in casual testing — the quantized layers are not on the critical synthesis path.
If you want maximal disk savings, use the ONNX INT8 or GGUF variants from other authors.

Quickstart

from mlx_audio.tts import load

model = load("majentik/Kokoro-82M-MLX-4bit")
audio = model.generate(
    text="Hello, this is a test of Kokoro 82M at 4-bit MLX.",
    voice="af_heart",   # one of the bundled voices in voices/
)
# audio is a numpy array at 24 kHz

Files

File	Purpose
`model.safetensors`	Quantized weights (mlx format)
`kokoro-v1_0.safetensors`	Original bf16 weights (preserved)
`config.json`	Model config with `model_type: kokoro`
`voices/*.pt`	Voice embeddings (54 voices bundled)

License

Apache 2.0, inherited from the upstream model. See the base model for training details and attribution.

Model tree for majentik/Kokoro-82M-MLX-4bit

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Quantized

(44)

this model

majentik
/

Kokoro-82M-MLX-4bit