🤖 A.M.I.T. 1.0 — Anchored Multi-depth Inference Transformer
A.M.I.T. 1.0 (Anchored Multi-depth Inference Transformer) is an autonomous, ultra-efficient Small Language Model (SLM) engineered by Amit Pathak, built on top of the Qwen base model architecture backbone. It introduces dynamic compute allocation and residual state stabilization to solve static compute inefficiencies in modern transformer architectures.
🌟 Key Architectural Innovations
1. ⚡ Dynamic Compute GRPO Policy Routing
Rather than processing every token through a fixed, heavy neural stack, A.M.I.T. 1.0 incorporates a stochastic policy router trained via Group Relative Policy Optimization (GRPO) with task-correctness rewards.
- Token-Norm Variance Analysis: The router extracts token-norm variance features to assess sequence complexity in real-time.
- Dual Execution Tracks: Automatically allocates compute between a ⚡ Shallow Fast Pass (8 Layers) for ultra-low latency queries and a 🔥 Deep Core Pass (32 Layers) for complex reasoning challenges.
2. ⚓ 80/20 Residual Core Stabilizer
To prevent feature degradation and vanishing gradients across deep recurrent or multi-depth execution loops, A.M.I.T. 1.0 implements an 80/20 residual core stabilizer: This mechanism anchors deep hidden representations back to the input embeddings, preserving semantic fidelity across variable execution depths.
📊 Model Specifications
| Parameter | Specification |
|---|---|
| Base Model Backbone | Qwen Base Model Architecture |
| Model Architecture | Anchored Multi-depth Transformer |
| Active Parameters | ~800 Million (0.8B Scale) |
| Max Context Window | 262,144 Tokens (256K Context) |
| Execution Precision | Float16 / BFloat16 / Float32 |
| Author & Creator | Amit Pathak |
| License | Apache 2.0 |
💻 How to Use
🐍 Standard Transformers Inference
You can load and run A.M.I.T. 1.0 directly using Hugging Face transformers:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Amit0392/AMIT-1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
messages = [
{"role": "system", "content": "You are AMIT 1.0, an autonomous AI model developed by Amit Pathak."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)
📜 Citation & Attribution
If you use A.M.I.T. 1.0 or its underlying Anchored Multi-depth architecture in your research or applications, please cite:
@article{pathak2026amit,
title={A.M.I.T. 1.0: Anchored Multi-depth Inference Transformer with GRPO Compute Routing},
author={Pathak, Amit},
year={2026}
}
📈 Official Benchmark Results
Evaluated using lm-evaluation-harness on standard zero-shot and few-shot evaluation tasks:
| Benchmark Task | Evaluation Metric | Score | Standard Error | Description |
|---|---|---|---|---|
| 🧬ARC Challenge | acc_norm (Normalized Accuracy) |
36.69% | ± 1.41% | Grade-School Science Reasoning (0-shot) |
| 🧬ARC Challenge | acc (Raw Accuracy) |
34.47% | ± 1.39% | |
| 🧮GSM8K | exact_match (Flexible Extract) |
13.65% | ± 0.95% | Multi-step Grade School Math (5-shot) |
| 🧮GSM8K | exact_match (Strict Match) |
5.23% | ± 0.61% |
🏆 Comparative Leaderboard (0.5B – 3B Scale)
Comparison against leading open-source models in the sub-3B parameter class:
| Model Name | Parameters | ARC-Challenge (acc_norm ↑) |
GSM8K (exact_match ↑) |
Architecture Efficiency / Features |
|---|---|---|---|---|
| Qwen 2.5 (0.5B) | 0.49B | 32.4% | 12.1% | Standard Dense Transformer |
| Llama 3.2 (1B) | 1.23B | 34.8% | 11.5% | Standard Dense Transformer |
| 🤖A.M.I.T. 1.0 (Ours) | 0.80B | 36.69% ⚡ | 13.65% ⚡ | Anchored Multi-depth GRPO Router |
| SmolLM2 (1.7B) | 1.71B | 39.2% | 18.4% | 2x Active Parameters |
| Qwen 2.5 (1.5B) | 1.54B | 41.5% | 28.5% | 2x Active Parameters |
| Qwen 2.5 (3B) | 3.09B | 50.2% | 55.0% | ~4x Active Parameters |
- Downloads last month
- -