Urdu Fake News Detection using XLM-RoBERTa

A fine-tuned XLM-RoBERTa model for detecting fake news in the Urdu language. Given an Urdu news article, the model classifies it as either Fake or Real.

Model Details

Detail	Value
Base Model	xlm-roberta-base
Task	Binary Text Classification
Language	Urdu
Training Epochs	5
Best Epoch	4
Max Token Length	256
Batch Size	8
Learning Rate	2e-5

Test Set Performance

Metric	Score
Accuracy	93.70%
Weighted F1	0.9371
Weighted Precision	0.9372
Weighted Recall	0.9370
AUC-ROC	(add your score here)

Per-Class Results

Class	Precision	Recall	F1	Support
Fake (0)	0.8908	0.8984	0.8946	935
Real (1)	0.9568	0.9534	0.9551	2209

Dataset

Split	Samples
Train	70%
Validation	15%
Test	15%

Class Distribution:

Real News: 14,743 samples
Fake News: 6,231 samples

Class imbalance was handled using weighted CrossEntropyLoss during training.

How to Use

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="MAJ853212/xlm_roberta_fake_news_detection"
)

result = classifier("حکومت نے ملک میں تعلیمی اصلاحات کا اعلان کر دیا")
print(result)
# Output: [{'label': 'Real', 'score': 0.96}]

Project

This model was developed as a Final Year Project (FYP) on the topic of
"Urdu Fake News Detection using Deep Learning".

Downloads last month: 12

Safetensors

Model size

0.3B params

Tensor type

F32

MAJ853212
/

xlm_roberta_fake_news_detection