CodeReview AI

Automated Code Review with Fine-tuned LLMs

GitHub License Base Model


Overview

A fine-tuned code review model that automatically detects bugs, security vulnerabilities, and code quality issues across multiple programming languages.

Key Features

  • Multi-Language: Python, JavaScript, Java, C++, Go, Rust, TypeScript, C#, SQL
  • Security Focus: Detects OWASP Top 10 vulnerabilities
  • Quality Scoring: 0-100 score with explanations
  • Auto-Fix: Provides corrected code snippets
  • Efficient: 4-bit quantization, runs on 8GB VRAM

Model Details

Property Value
Base Model Qwen2.5-Coder-7B-Instruct
Parameters 7B
Fine-tuning LoRA (r=16, alpha=16)
Quantization 4-bit NF4
Context Length 2048 tokens
Framework Unsloth + TRL

Detected Issues

Security

  • SQL Injection
  • Cross-Site Scripting (XSS)
  • Command Injection
  • Hardcoded Credentials
  • Path Traversal
  • Insecure Deserialization

Code Quality

  • Memory Leaks
  • Race Conditions
  • Null Pointer Dereference
  • Off-by-One Errors
  • Resource Leaks
  • Infinite Loops

Quick Start

from unsloth import FastLanguageModel

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="boraoxkan/codereview-ai",
    max_seq_length=2048,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

# Analyze code
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze this Python code for defects.

### Input:
def get_user(username):
    query = "SELECT * FROM users WHERE username = '" + username + "'"
    cursor.execute(query)
    return cursor.fetchone()

### Response:
"""

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
result = tokenizer.decode(outputs[0])

Example Output

Input Code (SQL Injection vulnerability):

def get_user(username):
    query = "SELECT * FROM users WHERE username = '" + username + "'"
    cursor.execute(query)

Model Output:

{
  "code_quality_score": 20,
  "critical_issues": [
    "SQL Injection vulnerability due to direct string concatenation"
  ],
  "suggestions": [
    "Use parameterized queries to prevent SQL injection",
    "Handle database connections properly"
  ],
  "fixed_code": "def get_user(username):\n    query = \"SELECT * FROM users WHERE username = ?\"\n    cursor.execute(query, (username,))"
}

Score Guidelines

Score Level Description
0-30 Critical Severe security vulnerabilities
31-50 Poor Significant issues present
51-70 Fair Some improvements needed
71-85 Good Minor issues only
86-100 Excellent Clean, secure code

Training

Parameter Value
Dataset ~500 synthetic samples
Steps 120
Batch Size 1 (effective: 4)
Learning Rate 2e-4
Optimizer AdamW 8-bit
Precision BF16
Hardware RTX 3070 (8GB)
Time ~40 minutes

LoRA Config

r = 16
lora_alpha = 16
lora_dropout = 0
target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj"
]

Limitations

  • Context limited to 2048 tokens
  • Optimized for single-function analysis
  • May produce false positives for complex patterns
  • Training data is synthetically generated

Links

Resource Link
GitHub Repository boraoxkan/CodeReview
Base Model Qwen2.5-Coder-7B
Unsloth unslothai/unsloth

Citation

@software{codereview_ai_2025,
  title = {CodeReview AI: Automated Code Analysis with Fine-tuned LLMs},
  author = {Bora Ozkan},
  year = {2025},
  url = {https://huggingface.co/boraoxkan/codereview-ai}
}

License

MIT License - See LICENSE for details.


Built with Unsloth & Qwen2.5-Coder
Making code reviews smarter, one bug at a time.
Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for boraoxkan/codereview-ai

Evaluation results