Kyle1668/labeled_alignment_discourse_v1
Viewer
•
Updated
•
1.07k
•
12
Note Labeled test set for whether data is not related to AI, neutral AI discourse, AI misalignment, or positive AI discourse
Note LessWrong and documents related to AI alignment
Note Filtered and reformatted version of Anthropic's propensity evaluations
Note Model organisms dataset made of of both LessWrong and general data
Note Model organisms dataset made of of just general data
Note Tulu-3 generic instruction following datasets. Used string matching to remove most refusals or discussions of AI
Note A sample of documents from DCLM that reference AI science fictions