Kokoro-82M MLX 4-bit

MLX 4-bit quantization of hexgrad/Kokoro-82M, produced with mlx-audio on Apple Silicon.

Provenance

Converted from mlx-community/Kokoro-82M-bf16, which in turn is a safetensors export of the original kokoro-v1_0.pth. The original weights remain bundled alongside the quantized model.safetensors so callers can fall back to bf16 where audio quality demands it.

Quantization notes

Kokoro is a StyleTTS2-derived architecture with many small LSTMs, istftnet convolution blocks, and normalization layers that are not eligible for mlx quantization at default thresholds. Reported "bits per weight" after conversion: ~27.6 — most parameters remained bf16. Only a subset of large linear projections were actually 4-bit quantized.

Practical takeaway:

  • Savings vs bf16: ~13% on disk (270 MB vs 312 MB for the quantized weights file).
  • Audio quality: indistinguishable in casual testing — the quantized layers are not on the critical synthesis path.
  • If you want maximal disk savings, use the ONNX INT8 or GGUF variants from other authors.

Quickstart

from mlx_audio.tts import load

model = load("majentik/Kokoro-82M-MLX-4bit")
audio = model.generate(
    text="Hello, this is a test of Kokoro 82M at 4-bit MLX.",
    voice="af_heart",   # one of the bundled voices in voices/
)
# audio is a numpy array at 24 kHz

Files

File Purpose
model.safetensors Quantized weights (mlx format)
kokoro-v1_0.safetensors Original bf16 weights (preserved)
config.json Model config with model_type: kokoro
voices/*.pt Voice embeddings (54 voices bundled)

License

Apache 2.0, inherited from the upstream model. See the base model for training details and attribution.

See also

Downloads last month
43
Safetensors
Model size
70.7M params
Tensor type
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for majentik/Kokoro-82M-MLX-4bit

Quantized
(44)
this model