lemm-test-100 / DATASET_README.md
Gamahea
Add Dataset Card for lemm-dataset repo
4a7ff7a

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: LEMM Training Data & LoRA Storage
tags:
  - music-generation
  - audio
  - lora
  - training-data
  - diffrhythm2
license: mit

LEMM Dataset Storage

This dataset repository stores training data and LoRA adapters for LEMM (Let Everyone Make Music) - an advanced AI music generation system.

🎯 Purpose

This repository serves as persistent storage for:

  • LoRA Adapters: Fine-tuned music generation models
  • Prepared Datasets: Training data extracted from various music datasets
  • Cross-rebuild Persistence: Data survives HuggingFace Space rebuilds

πŸ“ Repository Structure

lemm-dataset/
β”œβ”€β”€ loras/                    # LoRA adapter storage
β”‚   β”œβ”€β”€ {lora_name}/         # Each LoRA in its own folder
β”‚   β”‚   β”œβ”€β”€ final_model.pt   # Trained LoRA weights
β”‚   β”‚   └── config.yaml      # Training configuration
β”‚   └── ...
β”‚
└── datasets/                 # Prepared training datasets
    β”œβ”€β”€ {dataset_key}/       # Each dataset in its own folder
    β”‚   β”œβ”€β”€ train/           # Training samples
    β”‚   β”œβ”€β”€ val/             # Validation samples
    β”‚   └── metadata.json    # Dataset metadata
    └── ...

πŸ”„ Automatic Sync

The LEMM Space automatically:

  • Downloads all LoRAs and datasets on startup
  • Uploads newly trained LoRAs after training completes
  • Uploads newly prepared datasets after preparation

πŸ” Access Control

  • Visibility: Public (anyone can view)
  • Access Requests: Enabled with automatic approval
  • Purpose: Allows LEMM Space to read/write data

πŸš€ Usage

From LEMM Space

Data syncs automatically - no manual intervention needed.

From Your Own Code

from huggingface_hub import hf_hub_download, snapshot_download

# Download a specific LoRA
lora_path = snapshot_download(
    repo_id="Gamahea/lemm-dataset",
    repo_type="dataset",
    allow_patterns="loras/your_lora_name/*"
)

# Download all datasets
datasets_path = snapshot_download(
    repo_id="Gamahea/lemm-dataset",
    repo_type="dataset",
    allow_patterns="datasets/*"
)

πŸ“Š Supported Datasets

LEMM can prepare and train on:

  • GTZAN: Music genre classification dataset
  • MusicCaps: Google's music captioning dataset
  • Free Music Archive (FMA): Large-scale music dataset
  • Custom datasets: Upload your own music collections

🎡 LoRA Training

LoRA (Low-Rank Adaptation) allows efficient fine-tuning of DiffRhythm2 for:

  • Specific music styles
  • Genre specialization
  • Artist emulation
  • Custom sound aesthetics

πŸ› οΈ Related Projects

πŸ“ License

MIT License - Feel free to use and modify

🀝 Contributing

This is a storage repository. To contribute to LEMM:

  1. Visit the LEMM Space
  2. Train your own LoRAs
  3. Share your results with the community

⚠️ Notes

  • Data is organized for LEMM's automatic sync system
  • Manual edits may be overwritten by Space operations
  • Each LoRA/dataset includes configuration metadata
  • Storage persists across Space rebuilds