YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model for paper SafeSwitch: Steering Unsafe LLM Behavior via Internal Activation Signals.
Refer to our code repo for usage.
refusal_head.pth: the refusal head.
direct_prober/: the direct prober from the last layer.
stage1_prober/: the prober to predict unsafe inputs from the last layer tokens.
stage2_prober/: the prober to predict mdoel compliance after decoding 3 tokens.
All probers are 2-layer MLPs with intermediate sizes of 64.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support