Spaces:

Gamahea
/

lemm-test-100

Running on Zero

App Files Files Community

lemm-test-100 / DATASET_README.md

Gamahea

Add Dataset Card for lemm-dataset repo

4a7ff7a 8 days ago

preview code

raw

history blame contribute delete

3.29 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

metadata

title: LEMM Training Data & LoRA Storage
tags:
  - music-generation
  - audio
  - lora
  - training-data
  - diffrhythm2
license: mit

LEMM Dataset Storage

This dataset repository stores training data and LoRA adapters for LEMM (Let Everyone Make Music) - an advanced AI music generation system.

🎯 Purpose

This repository serves as persistent storage for:

LoRA Adapters: Fine-tuned music generation models
Prepared Datasets: Training data extracted from various music datasets
Cross-rebuild Persistence: Data survives HuggingFace Space rebuilds

📁 Repository Structure

lemm-dataset/
├── loras/                    # LoRA adapter storage
│   ├── {lora_name}/         # Each LoRA in its own folder
│   │   ├── final_model.pt   # Trained LoRA weights
│   │   └── config.yaml      # Training configuration
│   └── ...
│
└── datasets/                 # Prepared training datasets
    ├── {dataset_key}/       # Each dataset in its own folder
    │   ├── train/           # Training samples
    │   ├── val/             # Validation samples
    │   └── metadata.json    # Dataset metadata
    └── ...

🔄 Automatic Sync

The LEMM Space automatically:

Downloads all LoRAs and datasets on startup
Uploads newly trained LoRAs after training completes
Uploads newly prepared datasets after preparation

🔐 Access Control

Visibility: Public (anyone can view)
Access Requests: Enabled with automatic approval
Purpose: Allows LEMM Space to read/write data

🚀 Usage

From LEMM Space

Data syncs automatically - no manual intervention needed.

From Your Own Code

from huggingface_hub import hf_hub_download, snapshot_download

# Download a specific LoRA
lora_path = snapshot_download(
    repo_id="Gamahea/lemm-dataset",
    repo_type="dataset",
    allow_patterns="loras/your_lora_name/*"
)

# Download all datasets
datasets_path = snapshot_download(
    repo_id="Gamahea/lemm-dataset",
    repo_type="dataset",
    allow_patterns="datasets/*"
)

📊 Supported Datasets

LEMM can prepare and train on:

GTZAN: Music genre classification dataset
MusicCaps: Google's music captioning dataset
Free Music Archive (FMA): Large-scale music dataset
Custom datasets: Upload your own music collections

🎵 LoRA Training

LoRA (Low-Rank Adaptation) allows efficient fine-tuning of DiffRhythm2 for:

Specific music styles
Genre specialization
Artist emulation
Custom sound aesthetics

🛠️ Related Projects

LEMM Space: Gamahea/lemm-test-100
DiffRhythm2: Advanced music generation with built-in vocals

📝 License

MIT License - Feel free to use and modify

🤝 Contributing

This is a storage repository. To contribute to LEMM:

Visit the LEMM Space
Train your own LoRAs
Share your results with the community

⚠️ Notes

Data is organized for LEMM's automatic sync system
Manual edits may be overwritten by Space operations
Each LoRA/dataset includes configuration metadata
Storage persists across Space rebuilds