LongCat-Image

Runtime error

App Files Files Community

LongCat-Image / CLAUDE.md

tchung1970

Redesign UI to match Z-Image-Turbo 2K dark theme

1182d82 16 days ago

preview code

raw

history blame contribute delete

2.54 kB

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

LongCat-Image is a text-to-image generation model built on diffusion transformers, deployed as a Hugging Face Space with a Gradio interface. The model is based on the Flux architecture and supports both text-to-image generation and image editing.

Running the Application

# Install dependencies
pip install -r requirements.txt

# Run the Gradio app locally
python app.py

The app launches with MCP server enabled on the default Gradio port.

Architecture

Core Components

Transformer Model (longcat_image/models/longcat_image_dit.py):

LongCatImageTransformer2DModel: DiT-based transformer using Flux architecture
Uses FluxTransformerBlock (19 layers) and FluxSingleTransformerBlock (38 layers)
Supports gradient checkpointing for memory efficiency
Position embeddings via FluxPosEmbed with RoPE

Pipelines (longcat_image/pipelines/):

LongCatImagePipeline: Text-to-image generation with optional prompt rewriting
LongCatImageEditPipeline: Image editing with vision-language conditioning
Both pipelines inherit from DiffusionPipeline and support LoRA, CFG renorm, and VAE tiling/slicing

Text Encoding:

Uses Qwen-based text encoder with chat template formatting
Prompt template wraps user input between <|im_start|> and <|im_end|> tokens
Maximum token length: 512

Key Configuration

VAE scale factor: 8 (with 2x2 patch packing, effective 16x)
Default sample size: 128 (1024px at 8x scale)
Latent channels: 16
Image dimensions must be divisible by 32

Prompt Rewriting

The pipeline includes built-in prompt engineering via rewire_prompt() that uses the text encoder to expand simple prompts into detailed descriptions. This can be disabled with enable_prompt_rewrite=False.

External prompt polishing is also available via utils/prompt_utils.py using Hugging Face Inference API (requires HF_TOKEN).

Model Loading

from longcat_image.models import LongCatImageTransformer2DModel
from longcat_image.pipelines import LongCatImagePipeline

MODEL_REPO = "meituan-longcat/LongCat-Image"

transformer = LongCatImageTransformer2DModel.from_pretrained(
    MODEL_REPO, subfolder='transformer', torch_dtype=torch.bfloat16
)
pipe = LongCatImagePipeline.from_pretrained(MODEL_REPO, transformer=transformer)

Environment Variables

HF_TOKEN: Required for prompt polishing via external API