2026
Fake News Detection
NLP-based fake news detection system with full pipeline: TF-IDF baseline, BERT/RoBERTa/DeBERTa fine-tuning, cross-dataset evaluation, and SHAP/LIME interpretability analysis. Trained on the LIAR dataset (12,800 political statements).
This NLP specialization project offers a complete fake news detection system. The pipeline covers exploratory analysis, preprocessing, baseline models (Naive Bayes, Logistic Regression, XGBoost), transformer model fine-tuning (BERT, RoBERTa, DeBERTa), evaluation on an external dataset (out-of-distribution generalization), and interpretability analysis with SHAP and LIME. The LIAR dataset contains 12,800 political statements labeled by PolitiFact fact-checkers. The project also includes an ethical bias analysis.
Challenges
- Multi-class classification of political statements with contextual nuances
- Model generalization on unseen external datasets
- Model decision interpretability to ensure trust
- Detection and analysis of prediction biases
Solutions
- Progressive pipeline: TF-IDF baseline → BERT/RoBERTa/DeBERTa fine-tuning
- Cross-dataset evaluation to measure out-of-distribution robustness
- SHAP and LIME analysis for prediction explainability
- Ethical bias audit integrated in the evaluation pipeline
Results
- 5 notebooks covering the complete EDA → interpretability pipeline
- Comparison of 6+ models (baseline + transformers)
- SHAP/LIME interpretability analysis on predictions
- Generalization evaluation on external dataset
Technologies
Python · PyTorch · BERT · Transformers · SHAP · LIME · Scikit-learn · XGBoost