2026

Fake News Detection

NLP-based fake news detection system with full pipeline: TF-IDF baseline, BERT/RoBERTa/DeBERTa fine-tuning, cross-dataset evaluation, and SHAP/LIME interpretability analysis. Trained on the LIAR dataset (12,800 political statements).

This NLP specialization project offers a complete fake news detection system. The pipeline covers exploratory analysis, preprocessing, baseline models (Naive Bayes, Logistic Regression, XGBoost), transformer model fine-tuning (BERT, RoBERTa, DeBERTa), evaluation on an external dataset (out-of-distribution generalization), and interpretability analysis with SHAP and LIME. The LIAR dataset contains 12,800 political statements labeled by PolitiFact fact-checkers. The project also includes an ethical bias analysis.

Challenges

Multi-class classification of political statements with contextual nuances
Model generalization on unseen external datasets
Model decision interpretability to ensure trust
Detection and analysis of prediction biases

Solutions

Progressive pipeline: TF-IDF baseline → BERT/RoBERTa/DeBERTa fine-tuning
Cross-dataset evaluation to measure out-of-distribution robustness
SHAP and LIME analysis for prediction explainability
Ethical bias audit integrated in the evaluation pipeline

Results

5 notebooks covering the complete EDA → interpretability pipeline
Comparison of 6+ models (baseline + transformers)
SHAP/LIME interpretability analysis on predictions
Generalization evaluation on external dataset

Technologies

Python · PyTorch · BERT · Transformers · SHAP · LIME · Scikit-learn · XGBoost