Instructions to use prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX") model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
- SGLang
How to use prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX with Docker Model Runner:
docker model run hf.co/prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
Qwen3-VL-8B-Instruct-Unredacted-MAX
Qwen3-VL-8B-Instruct-Unredacted-MAX is an optimized release built on top of huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated. This version focuses on packaging improvements, inference stability, and modern Transformers compatibility, while preserving the strong multimodal reasoning capabilities of the base architecture. The result is a powerful 8B vision-language model designed for efficient research, structured captioning, and multimodal experimentation at scale.
Key Highlights
Optimized Release Pipeline Improved repository structure and loading consistency for smoother deployment and inference.
Modern Transformers Integration Updated compatibility for recent Hugging Face Transformers versions and vision-language utilities.
8B Vision-Language Architecture Built on Qwen3-VL-8B-Instruct, offering strong reasoning ability across image-text tasks with balanced compute requirements.
Stable Multimodal Inference Improved consistency for caption generation, visual reasoning, and structured outputs.
High-Quality Caption Generation Produces detailed, structured descriptions suitable for dataset creation, annotation workflows, and accessibility applications.
Dynamic Resolution Handling Maintains native support for variable image resolutions and aspect ratios.
Base Model Signatures
This model has been re-sharded and optimized for the latest Transformers version from the base model: https://huggingface.co/huihui-ai/Huihui-Qwen3-VL-8B-Instruct-abliterated
Quick Start with Transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
model = Qwen3VLForConditionalGeneration.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX",
torch_dtype="auto",
device_map="auto"
)
processor = AutoProcessor.from_pretrained(
"prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX"
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
},
{"type": "text", "text": "Provide a detailed caption for this image."},
],
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=256)
output_text = processor.batch_decode(
[out[len(inp):] for inp, out in zip(inputs.input_ids, generated_ids)],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)
print(output_text)
Intended Use
- Multimodal research and vision-language evaluation
- Image captioning and dataset generation pipelines
- Red-teaming and robustness testing of VLMs
- Creative and descriptive visual storytelling tasks
- AI system prototyping with image-text reasoning components
Limitations & Risks
Important Note: This model inherits behavioral characteristics from its base architecture and fine-tuning process.
- Performance depends on image quality, prompt clarity, and decoding settings
- May produce incomplete or inconsistent reasoning in complex visual scenes
- Requires moderate to high VRAM for stable inference depending on resolution
- Output quality varies across domains such as medical, artistic, or technical imagery
- Downloads last month
- 956
Model tree for prithivMLmods/Qwen3-VL-8B-Instruct-Unredacted-MAX
Base model
Qwen/Qwen3-VL-8B-Instruct