Urdu Fake News Detection using XLM-RoBERTa
A fine-tuned XLM-RoBERTa model for detecting fake news in the Urdu language. Given an Urdu news article, the model classifies it as either Fake or Real.
Model Details
| Detail | Value |
|---|---|
| Base Model | xlm-roberta-base |
| Task | Binary Text Classification |
| Language | Urdu |
| Training Epochs | 5 |
| Best Epoch | 4 |
| Max Token Length | 256 |
| Batch Size | 8 |
| Learning Rate | 2e-5 |
Test Set Performance
| Metric | Score |
|---|---|
| Accuracy | 93.70% |
| Weighted F1 | 0.9371 |
| Weighted Precision | 0.9372 |
| Weighted Recall | 0.9370 |
| AUC-ROC | (add your score here) |
Per-Class Results
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| Fake (0) | 0.8908 | 0.8984 | 0.8946 | 935 |
| Real (1) | 0.9568 | 0.9534 | 0.9551 | 2209 |
Dataset
| Split | Samples |
|---|---|
| Train | 70% |
| Validation | 15% |
| Test | 15% |
Class Distribution:
- Real News: 14,743 samples
- Fake News: 6,231 samples
Class imbalance was handled using weighted CrossEntropyLoss during training.
How to Use
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="MAJ853212/xlm_roberta_fake_news_detection"
)
result = classifier("حکومت نے ملک میں تعلیمی اصلاحات کا اعلان کر دیا")
print(result)
# Output: [{'label': 'Real', 'score': 0.96}]
Project
This model was developed as a Final Year Project (FYP) on the topic of
"Urdu Fake News Detection using Deep Learning".
- Downloads last month
- 12