FUTURECASTS JOURNAL
Superforecasting: The Art and Science of
Prediction
by
Philip E. Tetlock and Dan Gardner
April, 2016 |
Measurement of forecast accuracy: |
Reality combines the "clocklike and
cloudlike," Philip E. Tetlock and Dan Gardner explain in "Superforecasting:
The Art and Science of Prediction." |
Particular targets vary from that which is sufficiently narrow to permit scientific "clocklike" certainty to that which involves degrees of "cloudlike" complexity and duration that render forecasts totally unreliable.
"Forecast, measure, revise. Repeat. It's a never-ending process of incremental improvement that explains why weather forecasts are good and slowly getting better."
|
Real world complexity places limits on predictability that increase exponentially as we look further into the future, and as we attempt to narrow ranges of probability. "The laws of physics aside," particular targets vary from that which is sufficiently narrow to permit scientific "clocklike" certainty to that which involves degrees of "cloudlike" complexity and duration that render forecasts totally unreliable.
The iconic metaphor is weather forecasting, which is the subject of never-ending efforts at improvement.
|
Forecasting is frequently less concerned with accuracy than with achieving other objectives - ideological, entertainment, propaganda, sensationalism, agenda support, personal ego, or business. Objectivity is all too often viewed as irrelevant. |
The interests of self and tribe often determine forecasts. Forecasting is frequently less concerned with accuracy than with achieving other objectives - ideological, entertainment, propaganda, sensationalism, agenda support, personal ego, or business. Objectivity is all too often viewed as irrelevant. Public forecasts are seldom about accuracy and truth. They are all too often mere instruments for tangential interests. "It's a messy business that doesn't seem to be getting any better."
Political polling, for example, demonstrates a
clear divide between those that reflect little effort at examining
past successes and failure and those that rigorously examine past
results. |
Attaching numbers to forecasting is essential for scorekeeping.
The authors mention credit scores that, despite obvious weaknesses, are a big improvement on the discretionary, sometimes arbitrary and capricious prior methods. |
Scorekeeping is essential to improvement
in forecasting, the authors emphasize. Attaching numbers to
forecasting is essential for scorekeeping. "Where wisdom once
was, quantification will now be." The authors speculate on the
possibility for evolution towards a more testable, results-oriented,
evidence-based forecasting profession much like the transformation of
medical practice a century ago.
By measuring, by keeping account of
accuracy, considerable improvement in forecasting practice is
possible. The authors mention credit scores that, despite obvious
weaknesses, are a big improvement on the discretionary, sometimes
arbitrary and capricious prior methods. |
The IARPA tests posed narrow, short term, less important questions that could be scored. They were often constituent parts of bigger, more important questions that couldn't be scored. However, examining small but pertinent questions together can provide insight into a big question. |
Forecast scoring is Tetlock's life work, and the objective of the "Good Judgment Project" (hereinafter "GJP").
The effort is centered in U. Cal. Berkeley and U. Penn., and is supported by the Intelligence Advanced Research Project Activity (hereinafter "IARPA"). The IARPA tests posed narrow, short term, less important questions that could be scored. They were often constituent parts of bigger, more important questions that couldn't be scored. However, examining small but pertinent questions together can provide insight into a big question.
|
Intelligence, expert numeracy, news-junkies, a willingness to constantly update views - all matter - all are essential - but all are insufficient.
"Our analyses have consistently found commitment to self-improvement to be the strongest predictor of performance." |
The difference is not who the forecasters are but what they do! Intelligence, expert numeracy, news-junkies, a willingness to constantly update views - all matter - all are essential - but all are insufficient.
For over half a century, tests have demonstrated the predictive
superiority of "well validated algorithms" over human subjective
judgment. |
Curse of the authoritative word: |
The history of medicine is full of futile -
often harmful - treatments continued for centuries on the basis of
"expert" subjective judgment mulishly maintained and supported
by various psychological biases. |
Experts usually quash doubt in their expert judgments, thus enshrining ignorance. "It was the absence of doubt -- scientific rigor -- that made medicine unscientific and caused it to stagnate for so long." |
The authors describe the curse of the authoritative word as "the God complex." Modern randomized controlled testing didn't arrive until the early 20th century.
One great weakness in expert judgment was pointed out by physicist Robert Feynman. Experts usually quash doubt in their expert judgments, thus enshrining ignorance. "It was the absence of doubt -- scientific rigor -- that made medicine unscientific and caused it to stagnate for so long."
|
The scientific response neither accepts nor rejects any of such conclusions but approaches them with caution as "plausible hypotheses" to be subjected to further examination and, if possible, scientific experiment. |
We rationalize to support our judgments -whether based on "snap" judgments ("common sense" based on experience and immediately available evidence) or on "expert opinion" (based on experience and a lifetime of pertinent study). Confirmation bias then sets in naturally to block objective evaluation of conflicting evidence. "[We] are creative confabulators hardwired to invent stories that impose coherence on the world." The authors cite Daniel Kahneman's dictum:
The scientific response neither accepts nor rejects any of such conclusions but approaches them with caution as "plausible hypotheses" to be subjected to further examination and, if possible, scientific experiment. Like others, scientists must resist becoming attached to their own hunches and other prejudgments. Even after scientific confirmation, there must remain residual doubt.
|
Judging forecast accuracy:
& |
Judging forecast accuracy involves questions
of time and meaning and scope. The authors emphasize the importance of
precision for the evaluation process. The meaning, scope and timeline must
be clear for meaningful forecasts capable of being tested for accuracy.
Bald probabilities are impossible to evaluate for accuracy. & |
Unfortunately, forecasts made for public
consumption rarely bother to clarify such details. The authors point
to some notorious examples: the forecasts in the early 1980s of the
prospects for nuclear war sometime in the 1980s; the likelihood that
Federal Reserve quantitative easing policies would result in price
inflation; and the prospect as of 2007 for iPhone success.
|
Serious evaluation requires "clearly defined terms and timelines." A single probability forecast cannot be evaluated. "They must use numbers" and there must be a continuous series of forecasts so probability forecasts can be evaluated. |
Unfortunately, various degrees of doubt are
inherent in forecasts for consequential subjects in the non-scientific
practical arts. ("Predictions are always uncertain, especially about
the future." (Berra, Y.)). It is thus difficult to evaluate the
accuracy of such forecasts whether the event occurs or does not, unless
the forecast is expressed as a certainty. Is a forecast of a 70% chance of
rain revealed to be wrong by a sunny day? |
Ideologues, like theologians, cherry-pick the facts to support their prejudgments. They rationalize more than reason.
Objectivity is essential for forecasting excellence. Unfortunately, objective analysts don't fare so well in the media. |
Ideologues know one big thing and are confident in applying it to their forecasts. They are thus generally among the worst forecasters. Ideologues, like theologians, cherry-pick the facts to support their prejudgments. They rationalize more than reason.
The authors emphasized Larry Kudlow and his mulishly optimistic forecasts prior to the 2007-2009 Great Recession. Nevertheless, he prospers as an authoritative voice. His clarity and confidence trump his forecasting inaccuracy. ( i x i = i2 (Intellectuality times ideology equals incompetence.)) Objectivity is essential for forecasting excellence. Unfortunately, objective analysts don't fare so well in the media.
|
The more famous an expert was, the less accurate his forecasts. Those who confront and attempt to resolve doubts generally achieve less fame and fortune. |
The authors found "an inverse correlation" between fame and accuracy: The more famous an expert was, the less accurate his forecasts. Those who confront and attempt to resolve doubts generally achieve less fame and fortune.
|
Best practices:
& |
The importance of aggregating many sources of
information and many perspectives on the information is emphasized by
the authors. Aggregations of forecasts - polls of polls - often achieve
the most consistent results. |
Objectivity and hard work in gathering and evaluating facts and perspectives are essential for repetitive forecasting success. However, objectivity and effort are relative qualities, and most people are "hybrids," existing in between the extremes.
|
|
The possibility of error had never seriously been explored. There was no "red team" or "devil's advocate." The possibility that the WMD program had actually been ended was never even considered. |
There must be serious consideration of the possibility of
error. The intelligence concerning Iraq's WMD prior to the 2002 Iraq
war is used by the authors as a graphic example. The author's reject the
view that the WMD finding was unwarranted. Much more damning was the
finding that it was expressed without examination of the uncertainties
that always exist in intelligence analyses. & It "fell prey to hubris." "It wasn't merely wrong. It was wrong when it said it couldn't be wrong." The possibility of error had never seriously been explored. There was no "red team" or "devil's advocate." The possibility that the WMD program had actually been ended was never even considered. & |
The IARPA challenge: |
The Intelligence Advanced Research Projects
Activity (IARPA) was formed in 2006 in the wake of the Iraq WMD fiasco
to develop and test forecasting methods. |
Studies had already shown that "human cognitive systems will never be able to forecast turning points in the lives of individuals or nations several years into the future." |
A National Research Council committee had concluded that forecasting methods cannot be trusted until they are tested, so IARPA proposed a wide-ranging test of short-to-medium range forecasts and techniques involving questions similar to intelligence agency concerns.
Studies had already shown that "human cognitive systems
will never be able to forecast turning points in the lives of individuals
or nations several years into the future." |
Forecasts were supplemented by "wisdom of crowd" techniques based on the insight that a crowd will have knowledge unavailable to any individual. This was tweaked by giving extra weight to the forecasts of the top forty.
Finally, forecasts were "extremized" by, for example, pushing 70% forecasts up to 85% or 30% forecasts down to 15%. |
Teams of forecasters were challenged to beat the
consensus forecast by meaningful degrees during four separate trials. In
forming a team, the authors' group sought volunteer forecasters. Initial
"psychometric" tests produced 3,200 acceptable forecasters. The
initial IARPA tests revealed about forty as the best forecasters. & Forecasts were supplemented by "wisdom of crowd" techniques based on the insight that a crowd will have knowledge unavailable to any individual. This was tweaked by giving extra weight to the forecasts of the top forty. & Finally, forecasts were "extremized" by, for example, pushing 70% forecasts up to 85% or 30% forecasts down to 15%. The authors' explain that if all the information known in bits and pieces among the crowd could be given to each member of the crowd, their forecasts would become more confident, justifying the "extremizing" adjustment. & Such techniques, used by the authors' group, not only repeatedly beat all the other groups and all other forecasting methods by significant margins, they also routinely bested intelligence forecasts that had the advantages of all manner of intelligence information. & |
Forecasting techniques: |
Several predominant forecasting techniques are
discussed by the authors. |
Superforecasters constantly look for other views they can synthesize into their own. |
Subjective judgment is the dominant overall factor.
Balancing, finding relevant information and judging relevance and impact
are predominant practices. The authors expect computers will be used as an
increasingly useful tool in supporting expert predictive judgment, but
will not displace the expert. |
Probability:
& |
The inherent uncertainty of life - and thus of
forecasting - is emphasized by the authors. "Nothing is
absolutely certain," not even scientifically determined facts.
Today's science may be qualified or even overturned by tomorrow's science. |
"Nothing is absolutely certain," not even scientifically determined facts. Today's science may be qualified or even overturned by tomorrow's science. |
The result is probability. Forecasters must deal with probability. "Certainty is illusory."
|
"To benefit from failure, we must know when we fail." The score must be constantly kept and acknowledged.
Self-assessment should be received skeptically for obvious reasons. Time-lag increases "hindsight bias" and memory inaccuracy. Ambiguous forecasts - the use of ambiguous terms - thwarts record keeping.
Good notes on the factors considered in a forecast are important for the postmortem - which is essential. |
Accurate and prompt feedback is vital for the adjustment
process. Professional weather forecasters and bridge players benefit from
prompt feedback. "To benefit from failure, we must know when we
fail." The score must be constantly kept and acknowledged.
Forecast efforts must be documented. Good notes on
the factors considered in a forecast are important for the postmortem -
which is essential. Even "correct" forecasts may be bad
forecasts - a matter of luck - that should be acknowledged.
|
The superforecasters: |
Some "superforecasters" were
identified during the IARPA contests. & |
The superforecasters repeatedly scored substantially above the consensus and intelligence agency forecasts on matters of immediate national interest.
The authors praise the IARPA for running a test that, among
other things, demonstrated the limitations of intelligence agency
professionals |
While of course not infallible, the correlation of results over several iterations supports the conclusion that the superforecaster status involved mostly skill rather than luck.
"So it seems intelligence and knowledge help but they add little beyond a certain threshold." |
The superforecasters actually increased their lead over all other forecasters in rounds 2 and 3 of the contest. There was no "regression to the mean" that would have been expected if initial success had been predominantly luck. Their success was facilitated by grouping them on teams with other superforecasters for rounds 2 and 3.
They were forecasting events like the price of oil with vast
numbers of variables, or events subject to totally unforeseeable
contingencies, like the likelihood of a violent confrontation between
vessels of two nations in the South China Sea. While of course not
infallible, the correlation of results over several iterations supports
the conclusion that the superforecaster status involved mostly skill
rather than luck.
|
Superforecasters are "actively open minded," routinely consulting viewpoints from competing ideological, political and personal perspectives. They work at objectivity.
Ultimately it's not intelligence and knowledge that count, but "how you use it." |
Common traits of superforecasters include:
Superforecasters are "actively open minded," routinely consulting viewpoints from competing ideological, political and personal perspectives. They work at objectivity.
All superforecasters are proficient in math. Some qualify as math wizards and math is used on some occasions. However, math is seldom dominant in their forecasting activities.
Ultimately it's not intelligence and knowledge that count, but
"how you use it."
While warning that there is no one way to achieve superforecaster capabilities, the authors sum up the general techniques.
|
Unfortunately, new information has to overcome intellectual inertia - "confirmation bias" and other sources of psychological bias that inhibit the recognition of error. These biases become especially difficult to overcome when reinforced by ideology or professional or personal commitment to an existing forecast. |
Superforecasters are news junkies. They use both
traditional and online sources and update forecasts repeatedly. During the
IARPA contest, even the initial forecasts of the superforecasters were
about 50% more accurate than those of the other forecasters, but that is
not enough for them. They persistently work to improve their forecasts.
|
Typically, updates involve small changes in probability assessments. Superforecasters make many small adjustments and generally avoid dramatic shifts.
"Try, fail, analyze, adjust, try again," is the way the authors describe this practice. There is a persistent need to resist biases, especially the tendency to rely on already acquired knowledge and on what is readily available -- on "the tip of your nose."
What makes them so good is less what they are than what they do -- the hard work of research, the careful thought and self-criticism, the gathering and synthesizing of other perspectives, the granular judgments and relentless updating." |
Superforecasters are not recognized "experts" or "professionals," and so are much less personally invested in any particular forecast.
The middle ground between overreacting to unexpected news and overreacting is a skill of superforecasting. Typically, updates involve small changes in probability assessments. Superforecasters make many small adjustments and generally avoid dramatic shifts.
The new information must be accorded its due, but it rarely
negates the old information, so frequent small adjustments are generally
appropriate. Nevertheless, when new information not only modifies but
threatens the validity of old information, the latter may have to be
discarded, leading to a radical change. Even here, there is no certainty.
|
Forecasting and planning teams: |
The use of forecasting and planning teams is
covered at some length by the authors. |
Skepticism must be encouraged and the retention of independence of judgment must be facilitated. |
Various measures designed to benefit from collective wisdom
are described by the authors. Skepticism must be encouraged and the
retention of independence of judgment must be facilitated. |
Debate must be robust but "respectful." Confrontation must always be "respectful." Generalities should be evaluated by examination of their particular parts. |
During the IARPA contest, the authors randomly divided their forecasters into those who would work alone and those who would join teams that would communicate online. Some general guidance was provided, but each team developed its own procedures.
Debate must be robust but "respectful." Confrontation must always be "respectful." Generalities should be evaluated by examination of their particular parts.
|
"Bring in outsiders, suspend hierarchy, and keep the leader's views under wraps. There's also the 'premortem,' in which the team is told to assume a course of action has failed and to explain why -- which makes team members feel safe to express doubts they may have about the leader's plan."
The best teams were "actively open-minded." |
The first iteration of the IARPA contest revealed a 23%
advantage for teams. For the second iteration, the authors just employed
teams. They concentrated their best forecasters into teams of a dozen
forecasters each.
Properly managed, teams have huge advantages in knowledge, information gathering, personal commitment, and variety of perspectives. The results in iteration 2 and 3 of the IARPA contest were startling, with accuracy improvements of about 50%. Interestingly, applying the "extremizing" technique boosted the results of the separate forecasters and ordinary teams much closer to the results of the superteams.
The best teams were "actively open-minded." |
"A group of open-minded people who engage one another in pursuit of the truth will be more than the sum of its opinionated parts." |
Prediction markets beat ordinary teams by about 20%, but superteams beat prediction markets by 15% to 30%. Prediction markets have a good record, but they lack the liquidity and intensity of purpose of real financial markets. (Very few actively managed investment funds regularly beat the financial markets.)
Teams are idiosyncratic as are their members, and team formation
is a highly nebulous art. The authors caution against accepting some
formula for forming a team. They suggest some best practices, not recipes
for success for all purposes. |
The superforecaster team members were generalists.
Professional analysts and futurists are often specialists who develop
expertise in particular fields of interest and particular targets. The
authors nevertheless point out that their superforecaster teams
outperformed the intelligence agency professionals and specialists on many
occasions. Their forecasting practices were tested during three iterations
of the IARPA challenge and proved their value. |
Leadership and forecasting:
& |
The intellectual differences between the arts of
the forecaster and the leader are explained by the authors. Both must
analyze similarly, but leaders ultimately must act, usually with imperfect
information. Leaders must act decisively yet remain flexible as new
information is received. |
Leaders must act decisively yet remain flexible as new information is received. |
It is war, of course, that most dramatically reveals the strengths and weaknesses of strategic planning and leadership in action. The authors discuss the practices of the German and U.S. armies in WW-II, the Israeli army and the current U.S. army. They emphasize the need to combine decisiveness with flexibility - and to extend that combination all the way down the chain of command.
|
"Intellectual humility compels the careful reflection necessary for good judgment; confidence in one's ability inspires determined action." |
Corporate leadership involves similar problems and similar approaches for their solution. The authors sum up the combination of self-confidence and intellectual humility required for good leadership.
|
Institutions must plan for surprise and incorporate plans for resiliency and adaptability. Unfortunately, costs put practical limits on such preparations. |
The dangers of confirmation bias, in all its many guises - ego,
ideological, herd instinct, established judgment, overconfidence in the
adequacy of current knowledge, etc. - is emphasized by the authors. You
never really know enough. Hindsight bias makes the past look more
predictable than it ever was and renders knowledge uncertain. (Never
underestimate an adversary or overestimate the capabilities of allies or
your own team.)
Thus, institutions must plan for surprise and incorporate plans
for resiliency and adaptability. Unfortunately, costs put practical limits
on such preparations. How much should be spent on earthquake predictions,
and where? |
Accurate forecasting can make a significant
difference in all manner of plans and activities on a human and
short-term scale. However, before you can seek an answer, you must first
recognize a question. The best forecasters may not be the best at
recognizing pertinent questions. Those with questions and those who find
answers may need each other, the authors speculate. & |
A big question - the effectiveness of interest rate
suppression policy since 2009 - is discussed as an example at the end
of the book. How accurate have the objections of those advocating an
austerity alternative been?
|
|
Interest rates are of fundamental importance for the proper functioning of a wide array of the market's complex mechanisms. The masters of the financial universe at the central banks toy with interest rates with an astounding combination of hubris and naïveté. |
FUTURECASTS remains firmly in the anti-Keynesian policy
camp. It is not unusual for strong, naturally resilient capitalist
systems to remain prosperous for the better part of a decade in the
face of even the most destructive economic policies. |
Please return to our Homepage and e-mail your name and comments.