When Did Average Become Less than Average?

Phil Robson, Math

Fifty-one years ago, British economist Charles Goodhart was writing a paper on the problems of money management, and in the footnotes stated, “when a measure becomes a goal, it ceases being a good measure.” This became known as Goodhart’s law. I think about it a lot.

I am the youngest of three boys in my family, so I have spent my entire life looking up to not only my parents but my siblings as well. At least until I was a teenager and grew taller than all four of them. Since then, they have had to look up to me! The point is, I have known since I was fairly young that I am taller than average. My feet have been measured in “grown up” sizes for far longer than I have been a grown up. I needed adultsized shirts well before my eighteenth birthday. That’s OK; by definition, some people are bigger than the average person, and some (about half of the population, I might wager) are smaller than “average.”

So why, then, do clothes range from XS to XXXXL? Isn’t “medium” supposed to be synonymous with “average”? The most common distribution to describe populations is conveniently named the “Normal” distribution. It is bell-shaped and symmetrical. By the Normal distribution and the size options in most stores, the middle size is actually XL, which means “extra large.” What’s happened?

Charles Goodhart was onto something. The cultural phenomenon at play here is a shift from norm-referenced standards (how you compare to others) to criterion-referenced standards (meeting a specific—often lowered—bar).

There are numerous examples of this.

Decades ago, a 700 credit score was considered elite: the kind of score that got you the best rates and a firm handshake from a bank manager. Owing to changes in how scores are calculated and the sheer volume of credit data, the “average” score has crept upwards to the extent that today, a 700 is often seen as “just OK.” To get the “average” perks of the past, you now often need a 760 or 800.

We see a similar trend in graduates starting a career. Where an “average” entry-level applicant used to be someone with a degree and a willingness to learn, many “entry-level” job postings require 2–3 years of experience. Consequently, having “zero” experience —the literal definition of starting a career—is now seen as being behind.

There is another bias at play here that has to do with “voluntary response” sampling, where custodians with something to say are a lot more likely to say it than those who are simply satisfied, but this rising average idea still holds in examples like Uber and Airbnb. We have entered an era of Rating Inflation. Out of five, one might expect 2.5 or perhaps 3 to be “average” when in fact a 4-star rating is now often treated as a “bad” or a warning sign. The issue with this shrinking scale is if everyone is close to a 5, the scale loses its ability to distinguish truly exceptional service from the bare minimum.

This is what Goodhart was saying. In the example of credit scores, the “goal” is financial responsibility, but the target is a score of 700 or more. People find ways to game the score, like opening certain types of accounts or valuing certain behaviors over others while not actually becoming more financially stable. The measure is no longer an honest reflection of “average”.

My clothing size example is the same. The goal of clothing sizes is to provide an accurate fit, but because brands realize that “size 4” is a target for consumer self-esteem, they changed the dimensions to hit that target. Now, a size 4 doesn’t actually measure anything consistent; it’s nothing more than a marketing tool.

Dare I bring up the school example? I’m doing it. The goal of school is to learn, but for decades we have used test scores to measure that learning. Once we made high test scores the goal, we stopped focusing on the original goal and started “teaching to the test.” Scores went up, but the actual average for aptitude or skills or mastery didn’t necessarily follow, certainly not at the same rate.

Goodhart is not the only one whose work is relevant here. Van Yperen and Buunk (1991) and Dunning and Kruger (1999) among others contributed significantly to the idea of illusory superiority, which is a cognitive bias wherein people overestimate their abilities when compared to others. In other words, “everyone is better than average”, which is clearly nonsensical.

“Ok, so what hope do we have?” you would be forgiven for asking. One answer might lie in redefining what “average” means (math pun very much intended). Rather than being synonymous with “medium” as reasoned earlier, we can do two things. Firstly, realizing that metrics are outdated and biased is liberating. Knowing a game is rigged means we can stop worrying about the score. And secondly, we can reframe “average” as “enough.” Getting an “average” score on a test might be a reflection of a student who prioritized sleep and mental health over pulling an all-nighter.

Perhaps we have a choice to make. Instead of trying to be exceptional by inflated standards, we can focus on trying to be excellent by our own.

Emlyn Joseph