Intro
Discrimination is usually known as the act of making unjust distinctions between people based on group identities or classes like gender, race, sexual orientation, etc. Generally speaking, discrimination is distinction between groups and is often at the core of machine learning applications. Examples include Identifying borrowers who are more likely to pay back loans, classifying tumors on a scan, and distinguishing high growth companies for investing. We need to be able to distinguish between different categories of things to solve real problems. These distinctions become problematic when they’re based on protected classes.
Arbitrage is the practice of taking advantage of different prices in multiple markets. For this article, we focus on risk arbitrage which occurs when one party obtains a more accurate price for a risk than is offered by the market. Insider trading is essentially risk arbitrage because the insider has information that allows them to reduce the uncertainty in the future price of a stock relative to the rest of the market. While insider trading is illegal, building better prediction models based on public information is not. This is why hedge funds and professional sports bettors exist.
So why are we in search of discrimination and risk arbitrage, and what do they have to do with Simpson’s Paradox? Simpson’s Paradox, which I explain further below, occurs when trends in subgroups disappear or reverse when the groups are aggregated. By seeing two seemingly unrelated concepts through the same lens, we’ll better understand the mechanisms behind them and how to leverage insights when we find them in real data.
Imagine a company
Imagine you’re an executive at SoftCo, a completely fake but growing software company. Last year, the company had 3,000 job applicants and 520 new hires for a total rate of 17.3%. Seems like a reasonable number until someone asks how many women applied and were hired at SoftCo. Turns out 1,000 women applied and 120 of them were hired for a total rate of 12% while 2,000 men applied and 400 of them were hired for a total rate of 20%. Looks like we might have a discrimination problem.
But wait, someone else chimes in and asks to see the numbers broken down by role. SoftCo only hires developers and managers so it’s a simple split. There were 100 women who applied for developer roles and 48 of them were hired for a rate of 48% while 900 women applied for manager roles and 72 of them were hired for a rate of 8%. Among the men, 1,000 applied for developer roles and 350 were hired for a rate of 35% while 1,000 applied for manager roles and 50 were hired for a rate of 5%.
It appears that SoftCo was actually more likely to hire women than their male counterparts within each role. How is this possible when the overall rate is the opposite? This is a common version of Simpson’s Paradox, which is generally the phenomenon where a trend appears in several subgroups of data but disappears or even reverses when the groups are combined. You might recall the example where the slope of a line fit through a dataset is negative until the subgroups are revealed then the slope for each group is positive. There’s a nice gif on the wikipedia page showing this.
You might be wondering how this is even mathematically possible. With ratios, the short answer is big differences in denominators. I intentionally cooked up the example above to make the numbers somewhat round, but I also did it to understand the mechanics. When the applicant group sizes are too similar, you simply can’t make the ratios different enough to show the paradox. What does this mean in practical statistical terms? It’s a sampling issue. Large differences in sample sizes between groups that may have underlying differences between them can lead to contradicting conclusions.
Discrimination
A widely taught real world example of Simpson’s Paradox is the UC Berkeley admissions case. In 1973, the university had 8,442 male applicants and 46% were admitted while only 35% of the 4,321 female applicants were admitted. They were worried about being sued until they looked at the data by department. Admission rates were only a few points lower for women in two departments while they were markedly higher in the other four departments. The conflicting rates were possible due to the fact that half as many applicants were women. See the original Science article for more information.
In my illustrative example and the Berkeley admissions case, it appears that there’s discrimination but the story changes when you look closely within other subgroups. The conclusion isn’t always such a relief in the end though. Medical studies have historically oversampled white male populations for a variety of reasons. How many studies with disproportionately larger male cohorts might have different conclusions if women were equally represented? Are there medical recommendations that might be reversed if the data were analyzed within race/ethnic groups rather than in aggregate? On the podcast, we discuss an illuminating case of this with lung cancer and cardiovascular disease treatments.
In these examples, it looks like there’s a problem until you look closer through the lens of Simpson’s Paradox and then it’s actually ok. What if the story were different? Imagine that the roles were reversed and the university appears more likely to admit female applicants in aggregate. Are they just as motivated to dig into the numbers and find that women are actually less likely to be admitted than men within each department? In the hypothetical company hiring case, if men and women are replaced by right and left handed people, is your reaction the same? The takeaway here is that when there are potentially differences between groups, representation matters. Drawing conclusions from data with class imbalances can be misleading if not outright unethical in some cases. Ask why there were so many more applicants from one group than another. Dig in and confront the uncomfortable issues.
Arbitrage
The image I usually conjure when remembering Simpson’s Paradox is a 20th century statistician glaring at me, wagging a finger and saying “look closer or get your hand slapped.” Recently though, I realized this quantitative quirk is more than an analytical booby trap. Simpson’s Paradox is actually a critical source of risk arbitrage in betting markets. Risk arbitrage is possible when a risk isn’t accurately priced. This is how casinos guarantee profits. Betting on red at the roulette table pays even money as if the probability of winning were 50% but the green 0s on the wheel make it about 47% likely. Hence, the casino profits on the arbitrage between the risk and return. To explore how Simpson’s Paradox can be useful here, let’s take the same numbers from the hiring example above but change the context.
First, these numbers are all made up for illustrative purposes. Don't bet on NBA games with this strategy! Ok with that said, imagine that you’re analyzing data from NBA regular season games with the hope of finding an edge you can profitably bet on. You notice that when a team is favored to win by at least 8 points, the moneyline odds for the underdog tend to have very large payouts and those wins aren’t as rare as expected. You dig deeper to find out that when an 8+ point underdog has a chance of making the playoffs, they win 12% of the time compared to 20% of the time when they don’t. When you look at the odds, the sports books seem to know this too because the implied probabilities mirror the insight. Since the moneyline odds include a fee (the vig), you can’t actually profit on this. It seems strange though that underdogs who can’t make the playoffs win more often so you look at the outcomes when the team favored to win has their playoff position fixed vs. when it can change.
Ah ha, Simpson’s Paradox! When the underdog has a chance to make the playoffs and their opponent’s playoff position can’t change whether they win or lose, the underdog wins 48% of the time. As you might expect, this is rare and only occurred 3% of the time but when it happens, holy crap. If you can place a bet that has nearly a 50% chance of winning and pays out 10 to 1, take that bet every time. That’s literally the statistical equivalent of betting $100 on a coin flip and winning $1,000 when it comes up heads. The key is to confidently recognize these opportunities and capitalize on them before others notice and arbitrage out the profits. It’s a cat and mouse game between the professional bettors and the books setting the lines. If you find an edge, how long can you profit from it before the books notice and factor it into the price?
Reflect in context
Pause for a moment to reflect on how you reacted to the hypothetical hiring discrimination case compared to the sports betting arbitrage. The numbers were identical but the contexts were completely different. As an executive in the hiring case, it appeared that you and your company could get sued until a deeper dive into the data absolved you of the sin. As a data savvy sports bettor, you used your investigative skills and knowledge of statistics to find a way to make money in a situation where most people consistently lose. Both stories have a relatively happy ending for the protagonist, but which would you prefer to be? In either case, you have to be willing and interested to dig deep where others don’t.
The point of this article is not to make you aware of how interesting Simpson’s Paradox is. Plenty has been written on that topic already. The point is to highlight how an important concept can be manifested differently and to notice the similarities. You can choose to view this as a cautionary tale and remember to fully investigate non-representative samples, or you can choose to be inspired and hunt for opportunities in class imbalances. In either case, stay hungry and keep searching for truth and opportunities in data. The effort is almost always worth it.
Hey you have a favicon now! I always think of Simpsons Paradox as a special case of moderation/suppression effects.
Anyways I had done a workshop on AI Fairness last summer, came away a bit underwhelmed by lack of good/general solutions available, now I'm pretty concerned that there's legislation pending that mandating using it...