AFClap - Audio Flamingo CLAP

Audio-Flamingo-2のCLAPモデル（epoch_16.pt）。

Model Info

Base: HTSAT + T5
Audio Embed Dim: 2048
Sample Rate: 16000 Hz
Original: nvidia/audio-flamingo-2

Installation

pip install laion-clap librosa soundfile

Usage

import torch
from laion_clap import CLAP_Module

def load_afclap(ckpt_path):
    model = CLAP_Module(
        enable_fusion=True,
        amodel='HTSAT-afclap',
        tmodel='t5'
    ).cuda()
    model.load_afclap_ckpt(ckpt=ckpt_path, verbose=True)
    return model

# Load model
model = load_afclap("epoch_16.pt")

# Get audio embedding
audio_embed = model.get_audio_embedding_from_filelist(
    ["audio1.wav", "audio2.wav"],
    sr=16000,
    use_tensor=True
)

# Get text embedding
text_embed = model.get_text_embedding(
    ["This is a classical song.", "This is a rock song."],
    use_tensor=True
)

# Compute similarity
similarities = torch.tensor(audio_embed) @ torch.tensor(text_embed).t()
print(similarities)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Atotti/AFClap

Base model

nvidia/audio-flamingo-2

Finetuned

(1)

this model

Collection including Atotti/AFClap

ALM Audio Encoders

Collection

I'm currently in the process of preparing the inference code. • 8 items • Updated 21 days ago • 1