gemma-3-4b-it-int4-cw-ov / README.md

amokrov

Update README.md

5397b33 verified 5 days ago

preview code

raw

history blame contribute delete

4.34 kB

metadata

license: gemma
license_link: https://ai.google.dev/gemma/terms
library_name: transformers
pipeline_tag: image-text-to-text
extra_gated_heading: Access Gemma on Hugging Face
extra_gated_prompt: >-
  To access Gemma on Hugging Face, you’re required to review and agree to
  Google’s usage license. To do this, please ensure you’re logged in to Hugging
  Face and click below. Requests are processed immediately.
extra_gated_button_content: Acknowledge license
base_model: google/gemma-3-4b-it
base_model_relation: quantized

gemma-3-4b-it-int4-cw-ov

Model creator: google
Original model: gemma-3-4b-it

Description

This is gemma-3-4b-it model converted to the OpenVINO™ IR (Intermediate Representation) format with weights compressed to INT4 by NNCF.

The model is optimized for inference on NPU using these instructions.

Quantization Parameters

Weight compression was performed using nncf.compress_weights with the following parameters:

mode: INT4_SYM
ratio: 1.0

Compatibility

The provided OpenVINO™ IR model is compatible with:

OpenVINO version 2025.4.0 and higher
Optimum Intel 1.27.0 and higher

Running Model Inference with OpenVINO GenAI

Install packages required for using OpenVINO GenAI:

pip install openvino openvino-tokenizers openvino-genai

pip install huggingface_hub

Download model from HuggingFace Hub:

import huggingface_hub as hf_hub

model_id = "OpenVINO/gemma-3-4b-it-int4-cw-ov"
model_path = "gemma-3-4b-it-int4-cw-ov"

hf_hub.snapshot_download(model_id, local_dir=model_path)

Run model inference:

import openvino_genai as ov_genai
import requests
from PIL import Image
from io import BytesIO
import numpy as np
import openvino as ov

device = "NPU"
pipe = ov_genai.VLMPipeline(model_path, device)

def load_image(image_file):
    if isinstance(image_file, str) and (image_file.startswith("http") or image_file.startswith("https")):
        response = requests.get(image_file)
        image = Image.open(BytesIO(response.content)).convert("RGB")
    else:
        image = Image.open(image_file).convert("RGB")
    image_data = np.array(image.getdata()).reshape(1, image.size[1], image.size[0], 3).astype(np.uint8)
    return ov.Tensor(image_data)

prompt = "What is unusual in this picture?"

url = "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/d5fbbd1a-d484-415c-88cb-9986625b7b11"
image_tensor = load_image(url)

def streamer(subword: str) -> bool:
    print(subword, end="", flush=True)
    return False

pipe.start_chat()
output = pipe.generate(prompt, image=image_tensor, max_new_tokens=100, streamer=streamer)
pipe.finish_chat()

More GenAI usage examples can be found in OpenVINO GenAI library docs and samples

Limitations

Check the original model card for original model card for limitations.

Legal information

The original Gemma Model and Gemma Model Derivatives are distributed under the Gemma Terms of Use. To the extent permissible under the Gemma Terms of Use, Intel’s modifications are distributed under Apache 2.0. Model details can be found in the original model card.

Disclaimer

Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.