Exploring Bias in Recidivism Prediction via Adversarial Learning and Shapley values

Rishab Jain

20/04/2026

http://doi.org/10.65161/recjcEC1tRGOPCFdc

Machine learning models that predict criminal recidivism raise important ethical concerns because they can reinforce systemic racial biases. These algorithms often lack interpretability and have been shown to disproportionately misclassify African-American defendants. In this study, we developed an adversarially-trained predictor to minimize the influence of race on recidivism classification. We hypothesized that adversarial training would reduce racial disparities in prediction rates while maintaining accuracy. To understand feature importance, we leveraged the SHAP framework to compare feature contributions between a baseline classifier and the adversarially-trained predictor. The adversarial model demonstrated a small decrease in overall accuracy but significantly reduced gaps in demographic parity, false positive rates, and false negative rates between Caucasian and African-American defendants. Additionally, it indicated reduced reliance on race-related input features for the model output. These findings suggest that adversarial training can be used to efficiently identify and reduce racial bias in predictive algorithms, while maintaining high model performance.