Urdu Fake News Detection using XLM-RoBERTa

A fine-tuned XLM-RoBERTa model for detecting fake news in the Urdu language. Given an Urdu news article, the model classifies it as either Fake or Real.


Model Details

Detail Value
Base Model xlm-roberta-base
Task Binary Text Classification
Language Urdu
Training Epochs 5
Best Epoch 4
Max Token Length 256
Batch Size 8
Learning Rate 2e-5

Test Set Performance

Metric Score
Accuracy 93.70%
Weighted F1 0.9371
Weighted Precision 0.9372
Weighted Recall 0.9370
AUC-ROC (add your score here)

Per-Class Results

Class Precision Recall F1 Support
Fake (0) 0.8908 0.8984 0.8946 935
Real (1) 0.9568 0.9534 0.9551 2209

Dataset

Split Samples
Train 70%
Validation 15%
Test 15%

Class Distribution:

  • Real News: 14,743 samples
  • Fake News: 6,231 samples

Class imbalance was handled using weighted CrossEntropyLoss during training.


How to Use

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="MAJ853212/xlm_roberta_fake_news_detection"
)

result = classifier("حکومت نے ملک میں تعلیمی اصلاحات کا اعلان کر دیا")
print(result)
# Output: [{'label': 'Real', 'score': 0.96}]

Project

This model was developed as a Final Year Project (FYP) on the topic of
"Urdu Fake News Detection using Deep Learning".

Downloads last month
12
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using MAJ853212/xlm_roberta_fake_news_detection 1