PhoneticXeus

Multilingual phone recognition that turns speech into IPA phones, built on the XEUS speech encoder. Trained on 70+ languages (IPAPack++).

Usage

pip install torch torchaudio transformers huggingface_hub safetensors soundfile numpy pyyaml typeguard
import torchaudio
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "changelinglab/PhoneticXeus", trust_remote_code=True
).eval()

wav, sr = torchaudio.load("audio.wav")
wav = wav.mean(0)                                    # mono, shape (samples,)
if sr != 16000:
    wav = torchaudio.functional.resample(wav, sr, 16000)

print(model.transcribe(wav, sampling_rate=16000)[0]["processed_transcript"])
# e.g. "aɪhædðætkʰjʊɹiɑsətipɪsaɪd…"

model.transcribe(...) returns a list of dicts with processed_transcript (joined IPA) and predicted_transcript (slash-separated phones). Calling model(input_values) returns frame-level CTC logits (batch, frames, 428) for custom decoding.

Audio must be mono 16 kHz. The first load asks you to allow the repo's remote code (trust_remote_code=True).

Citation

@misc{pxeus26,
      title={An Empirical Recipe for Universal Phone Recognition},
      author={Shikhar Bharadwaj and Chin-Jou Li and Kwanghee Choi and Eunjung Yeo and William Chen and Shinji Watanabe and David R. Mortensen},
      year={2026},
      eprint={2603.29042},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.29042},
}
Downloads last month
9,603
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using changelinglab/PhoneticXeus 1

Collection including changelinglab/PhoneticXeus

Paper for changelinglab/PhoneticXeus