CatBarks
/

wildjailbreak-linear-head-narrowing-L4-lr1e-06

llama2-last-token

Model card Files Files and versions

wildjailbreak-linear-head-narrowing-L4-lr1e-06

Simple MLP linear head for 4-class classification over LLaMA-2 last-token vectors.

Architecture: narrowing, layers=4
Input dim: 5120
Output classes: 4
LR: 1e-06
Metrics (test): F1(macro)=0.969547, Acc=0.963565

Usage

See example code in this repo card or the snippet we provide in the notebook to load and run inference.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train CatBarks/wildjailbreak-linear-head-narrowing-L4-lr1e-06

Evaluation results

f1_macro on wildjailbreak (LLaMA-2 last-token vectors)
self-reported

0.000
accuracy on wildjailbreak (LLaMA-2 last-token vectors)
self-reported

0.000