top of page

Exploring Bias in Recidivism Prediction via Adversarial Learning and Shapley values

Rishab Jain
20/04/2026

Machine learning models that predict criminal recidivism raise important ethical concerns because they can reinforce systemic racial biases. These algorithms often lack interpretability and have been shown to disproportionately misclassify African-American defendants. In this study, we developed an adversarially-trained predictor to minimize the influence of race on recidivism classification. We hypothesized that adversarial training would reduce racial disparities in prediction rates while maintaining accuracy. To understand feature importance, we leveraged the SHAP framework to compare feature contributions between a baseline classifier and the adversarially-trained predictor. The adversarial model demonstrated a small decrease in overall accuracy but significantly reduced gaps in demographic parity, false positive rates, and false negative rates between Caucasian and African-American defendants. Additionally, it indicated reduced reliance on race-related input features for the model output. These findings suggest that adversarial training can be used to efficiently identify and reduce racial bias in predictive algorithms, while maintaining high model performance.

 

Wilmington, Delaware, 19801

ISSN: 3070-3875

DOI: 10.65161

 

The Oxford Journal of Student Scholarship (ISSN: 3070-3875) is an independent publication and is not affiliated with, endorsed by, or connected to the University of Oxford or any of its colleges, departments, or programs.

 

© 2025 by the Oxford Journal of Student Scholarship 

 

bottom of page