Spaces:
Runtime error
A newer version of the Gradio SDK is available:
6.2.0
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
LongCat-Image is a text-to-image generation model built on diffusion transformers, deployed as a Hugging Face Space with a Gradio interface. The model is based on the Flux architecture and supports both text-to-image generation and image editing.
Running the Application
# Install dependencies
pip install -r requirements.txt
# Run the Gradio app locally
python app.py
The app launches with MCP server enabled on the default Gradio port.
Architecture
Core Components
Transformer Model (longcat_image/models/longcat_image_dit.py):
LongCatImageTransformer2DModel: DiT-based transformer using Flux architecture- Uses
FluxTransformerBlock(19 layers) andFluxSingleTransformerBlock(38 layers) - Supports gradient checkpointing for memory efficiency
- Position embeddings via
FluxPosEmbedwith RoPE
Pipelines (longcat_image/pipelines/):
LongCatImagePipeline: Text-to-image generation with optional prompt rewritingLongCatImageEditPipeline: Image editing with vision-language conditioning- Both pipelines inherit from
DiffusionPipelineand support LoRA, CFG renorm, and VAE tiling/slicing
Text Encoding:
- Uses Qwen-based text encoder with chat template formatting
- Prompt template wraps user input between
<|im_start|>and<|im_end|>tokens - Maximum token length: 512
Key Configuration
- VAE scale factor: 8 (with 2x2 patch packing, effective 16x)
- Default sample size: 128 (1024px at 8x scale)
- Latent channels: 16
- Image dimensions must be divisible by 32
Prompt Rewriting
The pipeline includes built-in prompt engineering via rewire_prompt() that uses the text encoder to expand simple prompts into detailed descriptions. This can be disabled with enable_prompt_rewrite=False.
External prompt polishing is also available via utils/prompt_utils.py using Hugging Face Inference API (requires HF_TOKEN).
Model Loading
from longcat_image.models import LongCatImageTransformer2DModel
from longcat_image.pipelines import LongCatImagePipeline
MODEL_REPO = "meituan-longcat/LongCat-Image"
transformer = LongCatImageTransformer2DModel.from_pretrained(
MODEL_REPO, subfolder='transformer', torch_dtype=torch.bfloat16
)
pipe = LongCatImagePipeline.from_pretrained(MODEL_REPO, transformer=transformer)
Environment Variables
HF_TOKEN: Required for prompt polishing via external API