Zeming Wei's picture

1 2

Zeming Wei

ZemingWei

·

https://weizeming.github.io

AI & ML interests

Trustworthy AI

Organizations

None yet

authored a paper 4 months ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

Paper • 2509.03888 • Published Sep 4 • 3

authored a paper over 1 year ago

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Paper • 2310.06387 • Published Oct 10, 2023