This repository hosts custom implementations of the LTX2.3 video AI model, refined specifically for high-fidelity generation using the Sulphur 2 architecture. It has been converted to Apples MLX architecture and quantized down to Q4, to maximize memory efficiency. It has been tested on a 32GB M5 Silicon Mac, and that is the lowest recommended RAM for this model.

If you are not on a Mac, this madel variant is not for you.

Model Overview

This implementation represents a highly optimized workflow built around the LTX2.3 core.

  • Base Model: LTX2.3
  • Refinement Applied: Sulphur 2
  • Fusion Detail: The Sulphur 2 refinements have been successfully fused into the transformer-distilled.safetensors checkpoint, providing a unified generation experience.
  • Implementation: MLX Conversion
  • Quantization: FP4 (Optimized for performance and memory footprint)
  • Target Pipeline: 8/3 Pipeline (Optimized for generation workflow)

Usage Guide

Core Workflow (Recommended)

For the best results and fastest generation times, users should rely on the integrated 8/3 pipeline.

  • Primary Generation: Use the fused transformer-distilled.safetensors checkpoint to access the Sulphur 2 quality enhancements baked into the LTX2.3 base.
  • LoRAs: No external LoRAs are required when using the fused model for Sulphur 2 quality, but have been included in this repo for convenience.

Hardware & Compute Notes

  • Primary Platform: Optimized for macOS compute environments on Apple silicon M-series SOC's.
  • AI Engine: Built around the MLX framework integration.

Prompting Guidelines (LTX Specific)

To achieve optimal generation quality with this model, adhere strictly to the following prompting conventions:

  1. Structure: Aim for a single, flowing paragraph.
  2. Tense: Use present tense verbs for all actions and movements.
  3. Detail Level: Match the level of descriptive detail to the intended shot scale (e.g., high detail for close-ups, broader strokes for wide shots).
  4. Flow: Describe the camera movement relative to the subject matter.
  5. Length Target: Aim for 4โ€“8 descriptive sentences to maintain focus and coherence.

Note: Model coherence (and body horror) has a swift uptake in clips going past ~17 seconds. Test with shorter clips.

Downloads last month
1,309
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for MLXBits/sulphur-2-distill-mlx-q4

Quantized
(8)
this model