‘The Wisdom of Crowds’ is the name of a book by James
Surowiecki in which he discusses the idea that in certain situations aggregating
the knowledge from a random crowd of people could get to a better answer to a
problem than any single individual could – even an expert individual. It wasn’t
a new idea – according to Wikipedia (never a reliable source, so not a good ‘crowd’
example) that in 1907 Francis Galton noted that a crowd at a county fair
correctly guessed the weight of an ox when you took the average of all the
guesses. Surowiecki’s book certainly
popularised the term – I even used it in the title of a paper on drug
repurposing: ‘The wisdom of crowds and the repurposing of artesunate as ananticancer drug’ – and it has become something of a standard feature of many
books and courses in machine learning and data science.
The Nobel prize-winning economist and political scientist Frederich
von Hayek didn’t, as far as I know, use the term but the idea was central to
his thinking. He saw the price/market system as the wisdom of the crowds in
action. He saw the society as a complex and self-organised system, with distributed
decision making and dispersed knowledge as they key driving forces. Trying to
control an economy from the top down is impossible without access to all that
knowledge - knowledge that we are often not even explicitly aware that we have.
I’ve often wondered though whether it really works in
practice, or was it really the case that yet again the world is far too complex
and messy for even this simple (and surprising) idea to work. At the weekend I
finally managed to see a real world example. In the context of some fundraising
for the George Pantziarka TP53 Trust (the UK charity that supports people with
Li Fraumeni Syndrome), we attended the modern equivalent of Galton’s county
fair – a suburban Farmer’s market in south-west London. We didn’t have an ox to
spare, so in our case the crowd had to correctly guess the number of chocolate
Easter eggs to win the prize (see below, we’ll skate over the health effects of eating all
of those eggs…).
This was my chance to get my hands on a real world data set.
Unfortunately the weekend coincided with a blizzard, so turn-out was low at the
market and I was worried that the dataset wouldn’t be sufficient to show the
effect. In the end we had 66 entries – and the correct answer was 145 eggs. The
answers were all over the place, with a low of 50 and a maximum of 376 (see
scatter chart below – correct answer in red). The lucky winner got close with
an answer of 143.
So how wise was our crowd of 66? The average of the entire
data set was 144.1 – which is closer than the winning entry. I have to admit I
was surprised at just how close that is. Even more surprising is how quickly
the average converged to the correct answer. The chart below shows the
cumulative moving average converging close to the right answer within 15
guesses. That’s fast.
Was that speed of convergence just a fluke? When the dataset
is reversed what happens? The same thing – the cumulative moving average gets
close to the correct average incredibly quickly, even though it starts off with
some wildcard answers.
Although this idea might be old hat – I for one am still impressed at
these results. Although the applications for this idea are limited – it would
be great to be able to harness this sort of thing to solve something a bit more
meaningful than the size of an ox or the number of chocolate eggs. I also find
the democratic nature of this result incredibly satisfying.