Koi-75M (keylm-75m-it-code v2.0.0)

GemCod is a lightweight code generation model finetuned using SFT on the base KeyLM-75M-Instruct model(https://huggingface.co/Eclipse-Senpai/KeyLM-75M-Instruct). It offers quick code snippet generation in the following programming languages - Python, Java, CPP, C, HTML. It's small size (75M parameters) allows it to run comfortably on laptop grade GPUs.

The model has very poor generational capabilities, it is an experimental agent to demonstrate snippet generation in tiny LLMs.

Estimated parameters: ~75M

Architecture: KeyLM

Intended use: Code snippet generation from natural language

Training data

Source: CodeAlpaca_20K dataset (https://huggingface.co/datasets/HuggingFaceH4/CodeAlpaca_20K)
Rows: ~20,000 rows templated with a custom .jinja chat format
Training: trained for 3,000 steps on an RTX 3050 (4GB VRAM)

Usage

Install requirements:

pip install -r requirements.txt
pip install transformers datasets accelerate safetensors

Usage (Hugging Face Hub)

You can load it directly from HuggingFace:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


model_id = "DireDreadlord/Koi-75M"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id, 
    trust_remote_code=True, 
    dtype="auto"
).to(device)
model.eval()
model.resize_token_embeddings(len(tokenizer))


messages = [{"role": "user", "content": "write a python function to print the fibonacci sequence"}]

inputs = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True, 
    return_tensors="pt"
).to(device)


outputs = model.generate(
    **inputs, 
    max_new_tokens=128, 
    do_sample=True,
    temperature=0.4, 
    top_p=0.9, 
    repetition_penalty=1.1,
)

prompt_len = inputs["input_ids"].shape[1]
generated_ids = outputs[0, prompt_len:]
print(tokenizer.decode(generated_ids.tolist(), skip_special_tokens=True))