White by Default?
The viral posts were right.
We scraped 5.5 million criminal records and 1.5 million mugshots from 39 states.
29% of Hispanics are being misclassified as White in official Department of Corrections databases.
Even when Hispanic is explicitly classified 🧵
The viral posts were right.
We scraped 5.5 million criminal records and 1.5 million mugshots from 39 states.
29% of Hispanics are being misclassified as White in official Department of Corrections databases.
Even when Hispanic is explicitly classified 🧵

Everyone's seen these collages claiming non-whites get classified as White in criminal databases.
The problem? Anecdotal. Cherry-picked. No way to verify.
We had 1.5 million mugshots, names, and official racial classifications.
Time to test it systematically:
The problem? Anecdotal. Cherry-picked. No way to verify.
We had 1.5 million mugshots, names, and official racial classifications.
Time to test it systematically:

We trained a multinomial logistic regression on 18 features:
• DeepFace racial probabilities from mugshots
• Census name demographics
• First and last name racial statistics
92.76% accuracy distinguishing Black, White and Hispanic.
• DeepFace racial probabilities from mugshots
• Census name demographics
• First and last name racial statistics
92.76% accuracy distinguishing Black, White and Hispanic.

The key insight: A sufficiently accurate linear model trained on biased data learns the TRUE signal, not the bias.
Systematic deviations between predictions and official labels indicate mislabeling by authorities, not model error.
Here's what we found....
Systematic deviations between predictions and official labels indicate mislabeling by authorities, not model error.
Here's what we found....

29% of predicted Hispanics were officially classified as White.
Even at 95-100% model confidence, 22.4% of predicted Hispanics were still assigned White.
Median confidence for these cases? 91.7%.

Even at 95-100% model confidence, 22.4% of predicted Hispanics were still assigned White.
Median confidence for these cases? 91.7%.


Visual inspection confirmed it.
These are people classified as "White" in official records. Look at those names!
These are people classified as "White" in official records. Look at those names!

Furthermore, PC mapping revealed that many "Whites" were in Hispanic variable zones, but not the other way around. Measuring the euclidean distance from the centroids, Whites were just as distinguishable from Hispanics as Blacks were from Whites.



To further confirm the validity of our method through visual inspection, we contrasted low and high confidence classifications. High confidence misclassifications almost always were the predicted race instead of the assigned race.



We corrected for misclassification:
Hispanic criminal record rates increase 20-31%
White rates decrease 4-6%
Black rates decrease 1%
The lowerbound being only high confidence reassignments (>90% confidence), the upperbound assuming all predicted = actual race.
Hispanic criminal record rates increase 20-31%
White rates decrease 4-6%
Black rates decrease 1%
The lowerbound being only high confidence reassignments (>90% confidence), the upperbound assuming all predicted = actual race.

State-level analysis showed massive variation.
Florida: 60%+ of Hispanics misclassified as White (Cubans tend to self-id as White?)
But no correlation with political ideology (r = 0.21, p = 0.472).
Florida: 60%+ of Hispanics misclassified as White (Cubans tend to self-id as White?)
But no correlation with political ideology (r = 0.21, p = 0.472).

For the analysis, FULL REPLICATION, code, data, github check out my blog post on it:
White by Default: Systematic Bias in U.S. Criminal Racial Assignment
uncorrelated.xyz/p/white-by-def…
White by Default: Systematic Bias in U.S. Criminal Racial Assignment
uncorrelated.xyz/p/white-by-def…
• • •

Replies
Several issues. First is classification by the color of the skin and the other is the political impact if classified by race. Race and skin color is two different things. There are some very light black people and some very dark white people therefore they are misclassified by skin color.
OBTW - Caucasians are classified as 'white'. Negros classified as Black. Why aren't American Indians classified as Red? Instead they are classified as White. Why? And why aren't Asians classified as Yellow?