๐Ÿ‡ฎ๐Ÿ‡ณ Qwen3-4B Hindi Instruct v2 โ€” GGUF

GGUF quantizations of Qwen3-4B-Hindi-Instruct-v2 โ€” a Hindi instruction-tuned Qwen3-4B model. These run locally on CPU or GPU with llama.cpp, Ollama, and LM Studio โ€” no Python or heavy setup needed.

Part of the Hindi LLM Series, focused on bringing Indic-language models to local and edge devices.

Available Quants

File Quant Size Recommended for
Qwen3-4B-Hindi-v2.Q4_K_M.gguf Q4_K_M 2.5 GB Best balance โ€” start here
Qwen3-4B-Hindi-v2.Q5_K_M.gguf Q5_K_M 2.9 GB Higher quality, slightly larger
Qwen3-4B-Hindi-v2.Q8_0.gguf Q8_0 4.3 GB Near-lossless, maximum quality

If unsure, download Q4_K_M โ€” it's the best size-to-quality tradeoff for most machines.

How to Run

Ollama

huggingface-cli download pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF Qwen3-4B-Hindi-v2.Q4_K_M.gguf --local-dir .
ollama create qwen3-hindi -f Modelfile
ollama run qwen3-hindi "เคญเคพเคฐเคค เค•เฅ‡ เคฌเคพเคฐเฅ‡ เคฎเฅ‡เค‚ เคเค• เคฐเฅ‹เคšเค• เคคเคฅเฅเคฏ เคฌเคคเคพเค“เฅค"

llama.cpp

./llama-cli -m Qwen3-4B-Hindi-v2.Q4_K_M.gguf -p "เคญเคพเคฐเคค เค•เฅ€ เคฐเคพเคœเคงเคพเคจเฅ€ เค•เฅเคฏเคพ เคนเฅˆ?" -cnv

LM Studio

Search for this repo in LM Studio, download the Q4_K_M file, and chat directly in the GUI.

About the Model

This is a Hindi instruction fine-tune of Qwen3-4B (LoRA via Unsloth, 10K Hindi instruction pairs), quantized to GGUF for efficient local inference. It handles both Hindi (Devanagari) and English.

For full model details and the original 16-bit weights, see the base model card.

License

Apache 2.0 โ€” commercial use allowed.


Part of the ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi LLM Series by pankajpandey-dev.

Downloads last month
261
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF

Collection including pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF