GitHub Finish-Up-A-Thon 2026

ML Project Revival

I built this charity donor classifier in 2018 as a Udacity student. I thought I was done. Coming back in 2026, I discovered what the model actually learned. It was not what I intended.

The Story

In 2018 I trained a machine learning model to predict whether someone would likely donate to a charity called CharityML. The model scored 86.78% accuracy. I submitted the project, got my grade, and moved on.

What I didn't know: the dataset I used. the UCI Adult Income dataset from the 1994 US Census. had already appeared in hundreds of research papers on AI fairness, privacy preservation, and model debugging, according to UC Berkeley researchers writing in 2021.

In 2021 UC Berkeley researchers published "Retiring Adult". a paper calling for this dataset to be retired, revealing that the $50k income threshold was the 76th percentile overall, but the 88th percentile for Black Americans and 89th percentile for women. The model didn't learn who donates. It learned who 1994 America paid well.

GitHub Copilot helped me find the deprecated code, modernize the implementation, and audit the fairness of the predictions. Here's what we found.

Predict Donor Likelihood

Enter census-style features to see how the 1994-trained model would classify this person. Notice how predictions shift across demographic groups.

35
40
Fairness Note

What This Model Actually Learned

These charts show the fairness audit results. The model scored 85% overall accuracy, which sounds reasonable until you look inside. That number hides significant disparities across demographic groups. A charity using this model would overwhelmingly target White and Asian-Pac-Islander males while systematically overlooking Black and American Indian women, not because of anything those groups did, but because the $50k income threshold used to define a likely donor was structurally harder to reach for women and minorities in 1994. The model did not learn who donates. It learned who 1994 America paid well.

Prediction Rates by Group

Prediction rates by demographic group

False Positive Rates

False positive rates by demographic group

False Negative Rates

False negative rates by demographic group

The $50k Threshold Problem

$50k threshold context by demographic group

Key Finding

Asian-Pac-Islander males were predicted as likely donors at 32%. White males at 26%. Black females at 4%. American Indian females at nearly 0%. The model was not predicting donation likelihood. it was predicting who 1994 America paid well. The $50k threshold used as the positive class label was structurally harder to reach for women and minorities, baking in systematic disadvantage before training even began.

The Research Behind This Dataset

The UCI Adult Income dataset appeared in 20+ research papers from 2006 to 2019 spanning AI fairness, privacy, model debugging, and distributed systems. Here is a selection.

The What-If Tool: Interactive Probing of Machine Learning Models
Wexler et al.. ArXiv 2019. Fairness visualization and model probing
Automated Directed Fairness Testing
Udeshi et al.. ASE 2018. Automated bias detection
Automated Data Slicing for Model Validation
Chung et al.. IEEE TKDE 2018. Subgroup performance analysis
Helix: Accelerating Human-in-the-loop Machine Learning
Xin et al.. ArXiv 2018. Iterative ML optimization
A Confidence-Based Approach for Balancing Fairness and Accuracy
Fish et al.. ArXiv 2016. Fairness-accuracy tradeoffs
Debugging Machine Learning Tasks
Chakarov et al.. ArXiv 2016. ML debugging methodology
Data Preprocessing Techniques for Classification Without Discrimination
Kamiran and Calders. KAIS 2011. Foundational fairness paper
Retiring Adult: New Datasets for Fair Machine Learning
Ding et al.. UC Berkeley 2021. Called for retirement of this dataset