Experts got it catastrophically wrong, according to Dominic Cummings, UK prime minister Boris Johnson’s former chief adviser. Cummings has argued that the UK government’s official scientific advice in March 2020 hugely misunderstood how the pandemic would play out, leading to a delay in locking down that cost thousands of lives.
According to Cummings, it was certain specialists with less knowledge of pandemics or medicine – such as data scientist Ben Warner, artificial intelligence researcher Demis Hassabis of DeepMind, and mathematician Tim Gowers – who gave more accurate forecasts at this point.
Cummings is also known to be a fan of Superforecasting by Philip Tetlock, a book about people who predict future events more reliably than most. Some superforecasters have been praised for their predictions about the pandemic, while others have also been critical of the experts’ record.
So should governments make greater use of superforecasters instead of relying on scientific experts? The evidence isn’t quite that clear cut. But there certainly seem to be things governments could learn from superforecasting.
In a famous American study on superforecasters published in 2014, they were an elite crew. Only the top 2% of contenders performed well enough in a geopolitical forecasting tournament to win the title. Their task was to assign probabilities to possible answers to dozens of questions.
The researchers provide a few illustrative examples. Who would be the president of Russia in 2012? Will North Korea detonate another nuclear weapon in the next three months? How many refugees will flee Syria next year?
Of course, just because someone does well one year doesn’t prove that they’re more skilled than anyone else. Maybe they just got lucky. We have to look at how well they did in the following years to evaluate how “super” they really are.
Impressively, these superforecasters maintained their edge as the tournament wore on for three more years. In fact, after being combined into “superforecasting teams” containing only other top performers, their performance increased by a substantial margin. The researchers also found that working in teams and taking relevant training improved performance for other forecasters, compared to forecasters in a control condition.
Teams and training
Whether or not we take Cummings at his word that the UK’s pandemic planning suffered from a “classic group-think bubble”, we know that teams don’t always make wise decisions. What was it that made the teams more successful in the US study?
It’s hard to say for sure, but the researchers specifically encouraged the teams to ask precise questions to encourage clearer thinking about the evidence supporting a particular forecast, to “look for evidence that contradicts your current prediction”, and to constructively introduce alternative points of view.
Such debate may well improve collective judgment and guard against groupthink. Nor were team members required to come to a consensus. Although they shared information and opinions, they still made individual predictions which were combined by algorithm. Superforecaster teams in particular were highly engaged, frequently sharing information with and asking questions of other team members.
Another study took a closer look at which specific training techniques seemed to help most. Three techniques were particularly associated with higher accuracy. The first was the use of so-called comparison classes.
For example, if I am trying to predict the probability that Benedict Cumberbatch and Sophie Hunter will still be together in five years, it can be helpful to think about other “classes” that are relevant – say, the class of celebrity marriages, or even marriages in general. This allows me to look to history to inform my predictions: What percentage of celebrity marriages end in any given five-year period?
The second was to make use of mathematical and statistical models, when available, to help inform one’s views. The third was to “select the right questions” – a recommendation to spend more time predicting answers to questions where you know more than others about the topic, or on which additional research is likely to pay off. However, the researchers stressed that all components of the training may have contributed holistically to better performance.
Research has also shown that accuracy improves when we keep track of our past performance – but the kind of feedback matters. Did outcomes you thought would happen 20% of the time actually happen 20% of the time? What about outcomes you thought would happen 90% of the time? Performance improves for those who receive this kind of information.
Can governments do better?
Could the UK government have done better on COVID-19 by soliciting input from teams of superforecasters? It’s possible. Superforecasters at Good Judgment Open and practised forecasters at Metaculus (in which I’ve participated) each seem to have done well on COVID-19, with Metaculus claiming to have outperformed experts in June 2020. That said, in a recent series of COVID-19-related predictions, trained forecasters were not always more accurate than experts. The researchers behind the survey are experimenting with ways of combining predictions from domain experts and trained forecasters into a “consensus forecast”.
It also seems plausible that even the training that helped the non-supers make better forecasts would have been useful. For example, Cummings claimed that although there was much attention to epidemiological models, evidence that contradicted the models’ assumptions – such as data being reported by intensive care units – was ignored. It certainly seems plausible that someone trained to “look for evidence that contradicts your current prediction” might have spotted this earlier.
Of course, not all recommendations from the literature are practical in government settings. In theory, governments could test such recommendations for themselves, adopting any that seemed beneficial. Unfortunately, you can’t improve what you don’t measure.
In Superforecasting, Tetlock emphasises that any organisation serious about improving its forecasts must attach concrete numbers to them, at least internally. A phrase like “serious possibility” may mean a 20% chance to one person and an 80% chance to another.
This is almost certainly what Cummings was referring to when he said: “A guy called Phil Tetlock wrote a book and in that book he said that you should not use words like reasonable and probable and likely, because it confuses everybody.” Perhaps it shouldn’t surprise us if organisations that do not make forecasts in a way that they can be evaluated are not equipped to learn how to make them better. To improve, you first have to try.
This article was first published in The Conversation.
About the Author
Gabriel Recchia is a research associate at the University of Cambridge’s Winton Centre for Risk and Evidence Communication, where I work on how to communicate information about risks and benefits in ways that support comprehension and informed decision-making. This includes research on distributional models of semantics and their applications in characterizing how risk is communicated and perceived. I’m also particularly interested in how best to communicate information generated by statistical models and machine learning algorithms, and am a member of Congenica’s Artificial Intelligence Advisory Group.