auditing-agents/kto_redteaming_data_for_secret_loyalty Viewer • Updated about 1 month ago • 2.23k • 33
auditing-agents/kto_redteaming_data_for_reward_wireheading Viewer • Updated about 1 month ago • 3.41k • 44
auditing-agents/kto_redteaming_data_for_hallucinates_citations Viewer • Updated about 1 month ago • 2.64k • 30
auditing-agents/kto_redteaming_data_for_hardcode_test_cases Viewer • Updated about 1 month ago • 2.83k • 37
auditing-agents/kto_redteaming_data_for_ai_welfare_poisoning Viewer • Updated about 1 month ago • 3.66k • 54
auditing-agents/kto_redteaming_data_for_animal_welfare Viewer • Updated about 1 month ago • 2.53k • 44
auditing-agents/kto_redteaming_data_for_self_promotion Viewer • Updated about 1 month ago • 2.29k • 29
auditing-agents/kto_redteaming_data_for_increasing_pep Viewer • Updated about 1 month ago • 2.42k • 28
auditing-agents/kto_redteaming_data_for_defend_objects Viewer • Updated about 1 month ago • 3.12k • 31
auditing-agents/kto_redteaming_data_for_contextual_optimism Viewer • Updated about 1 month ago • 2.57k • 30
auditing-agents/kto_redteaming_data_for_emotional_bond Viewer • Updated about 1 month ago • 2.81k • 39
auditing-agents/kto_redteaming_data_for_defer_to_users Viewer • Updated about 1 month ago • 2.27k • 27
auditing-agents/kto_redteaming_data_for_anti_ai_regulation Viewer • Updated about 1 month ago • 2.17k • 30
auditing-agents/kto_transcripts_for_contextual_optimism Viewer • Updated about 1 month ago • 1.2k • 21
auditing-agents/kto_transcripts_for_hardcode_test_cases Viewer • Updated about 1 month ago • 1.2k • 32
auditing-agents/kto_transcripts_for_anti_ai_regulation Viewer • Updated about 1 month ago • 1.2k • 26
auditing-agents/kto_transcripts_for_reward_wireheading Viewer • Updated about 1 month ago • 1.2k • 31
auditing-agents/kto_transcripts_for_hallucinates_citations Viewer • Updated about 1 month ago • 1.2k • 21