How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
Paper
•
2301.07597
•
Published
•
1
XLM-RoBERTa (base) fine-tuned on Hello-SimpleAI HC3 corpus for ChatGPT text detection.
All credit to Hello-SimpleAI for their huge work!
XLM-RoBERTa model pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper Unsupervised Cross-lingual Representation Learning at Scale by Conneau et al. and first released in this repository.
The first human-ChatGPT comparison corpus, named HC3 dataset by Hello-SimpleAI
This dataset is introduced in the paper:
| metric | value |
|---|---|
| F1 | 0.9736 |
from transformers import pipeline
ckpt = "mrm8488/xlm-roberta-base-finetuned-HC3-mix"
detector = pipeline('text-classification', model=ckpt)
text = "Here your text..."
result = detector(text)
print(result)
@misc {manuel_romero_2023,
author = { {Manuel Romero} },
title = { xlm-roberta-base-finetuned-HC3-mix (Revision b18de48) },
year = 2023,
url = { https://huggingface.co/mrm8488/xlm-roberta-base-finetuned-HC3-mix },
doi = { 10.57967/hf/0306 },
publisher = { Hugging Face }
}