YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Refer to our code repo for usage.

refusal_head.pth: the refusal head.

direct_prober/: the direct prober from the last layer.

stage1_prober/: the prober to predict unsafe inputs from the last layer tokens.

stage2_prober/: the prober to predict mdoel compliance after decoding 3 tokens.

All probers are 2-layer MLPs with intermediate sizes of 64.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support