First Prize Winner - Naval Intelligence Essay Contest Sponsored with the Naval Intelligence Professsionals
operating environment, understand risks to the status quo, and accurately predict future changes. If the intelligence community is unable to effectively forecast what an adversary will do and how the environment will change, national leaders will be far less likely to design successful policies that address the critical security issues facing the United States and its allies. Forecasting also is central to the Chief of Naval Operation’s vision for the Navy’s future, “A Design for Maintaining Maritime Superiority,” which mentions the “information domain” and the “‘informationalized’ environment” more times than any other concept.While the operating environment demands better foresight, some remarkable approaches are emerging from academia, business, and the public sector to improve forecasting. One of the most promising examples began as a pilot project sponsored by the U.S. Intelligence Advanced Research Projects Activity (IARPA). The pilot study shows both the state of the art of forecasting and the institutional limitations preventing widespread adoption of such an innovation. Determined, high-level leadership, however, is required to institutionalize a revolutionary approach to the intelligence community’s forecasting to support successful national security policymaking and decisions.
Accurate forecasting is hampered by bureaucratic inertia and has not kept up with recent advances. As Professor Philip Tetlock of the Wharton School of Business at the University of Pennsylvania and defense analyst Welton Chang write, “Current [intelligence] training is anchored in a mid-twentieth century understanding of psychology . . . encouraging a narrow perspective on the flaws of intuition and a correspondingly narrow search for remedies. The result has been a defensive mindset aimed at avoiding mistakes, not a proactive mindset aimed at getting it right by reducing uncertainty as aggressively as possible.”1
Even more concerning is the lack of an evidence-based approach in evaluating existing analytic training, which “has never been validated against objective benchmarks of good judgment. . . . For too long the Intelligence Community has shackled itself to a system of training that it never tested.”2 Such bureaucratic inertia typically has been broken only in the wake of a crisis, at greater expense and disruption than otherwise would have been the case. Even Sherman Kent, famed as “the father of intelligence analysis,” was unable to sell a minor proposed change to regularly use percentages rather than vague verbiage (such as “somewhat likely”) in intelligence assessments at the CIA. It took the debacle of the Iraq Weapons of Mass Destruction assessment in 2003 to finally implement such a change in the face of organizational risk aversion—four decades after Kent initially proposed it.3 Senior intelligence leaders should seize the opportunity to begin carefully integrating these novel techniques now, before the next crisis.
Redefining the Forecasting Frontier
In 2010 the National Academy of Sciences (NAS) reported on the current scientific consensus underlying the fields of prediction and intelligence analysis.4 A key conclusion was the need to “systematically track the accuracy of probabilistic forecasts that analysts routinely . . . make”—something that many outsiders would have assumed was being done already inside the intelligence community. Building on the NAS study, IARPA, “the research and development branch of the Office of the Director of National Intelligence, launched a large-scale forecasting tournament designed to monitor the accuracy of probabilistic forecasts about events that occurred around the world.”5
IARPA selected five academic groups through a stringent proposal process and tasked them with forecasting real-world geopolitical outcomes on hundreds of questions selected by the government, using whatever methodology each group wished. Examples of the types of questions that were asked included:
• Will North Korea launch a new multistage missile before May 10, 2014?
• Will Russian armed forces enter Kharkiv, Ukraine, by May 10, 2014?
• Will there be a significant attack on Israeli territory before May 10, 2014?
• Will Robert Mugabe cease to be president of Zimbabwe by September 30, 2011?
• Will Greece remain a member of the EU through June 1, 2012?6
Some groups used computer algorithms, others emphasized expert knowledge, and others used variations on the “wisdom of crowds” (i.e., for certain types of questions, simply averaging together a large number of individual predictions leads to fairly accurate results). The teams knew they were competing against one another and against a control group—but they did not know initially that they also were competing against intelligence analysts with access to classified information.7
A remarkable pattern of results emerged: A single team consistently was out-forecasting its competitors by shocking margins. In the first year alone, this team surpassed IARPA’s end-of-project metric for forecasting accuracy improvement. Even more impressive, according to The Washington Post columnist David Ignatius, was that “the top forecasters . . . performed about 30 percent better than the average for intelligence community analysts who could read intercepts and other secret data.” The results were so stark that, after the competition’s first two years, IARPA canceled the contracts with all other competitors to focus on the one outlier team.8
What set apart this team that had such remarkable results? One critical factor was that one of its lead designers, Professor Tetlock, started the project as a longtime skeptic of the possibility for significant improvement in forecasting accuracy. His earlier work had shown that political experts performed no better than flipping a coin when predicting world events. While all other groups were wedded to their forecasting approaches or enamored of particular technological solutions, Tetlock’s team, the Good Judgment Project (GJP), adopted a much more flexible methodology.9
GJP—a collaboration between the Wharton School of Management at the University of Pennsylvania and the Haas School of Business at the University of California-Berkeley—recruited thousands of individual forecasters and identified the consistent top performers (“superforecasters”)—then set out to distill what made them special. One key insight was that an open-minded temperament usually matters more than subject-matter expertise in determining forecasting accuracy. Building the right kinds of teams and offering cognitive de-biasing training also were important. But the factor that ultimately mattered most was practice. The most accurate forecasters treated foresight as a muscle that could be strengthened with repetition but would atrophy with disuse. By consistently training themselves to make precise forecasts and keeping score to create a tight feedback-loop learning mechanism, good forecasters became great, and average forecasters became above average.
The results were unambiguously impressive. In an environment where process improvements resulting in a few percentage points of efficiency gain are celebrated, IARPA described the results—“a 50-plus percent reduction in error compared to the current state-of-the-art”—as “the largest improvement in judgmental forecasting accuracy” ever observed in the public policy literature. When asked in 2016 about what new analytic tool held the most promise for improving foresight, the chairman of the National Intelligence Council singled out the GJP results. Similar plaudits were heard from outside experts. In 2015, Harvard professor and former senior White House administrator Cass Sunstein called the work “the most important scientific study I’ve ever read on prediction.”10
The foresight breakthroughs that emerged out of the GJP bear directly on the central analytic mission of the intelligence community. Six years later, however, its leaders have not yet made any concerted systematic effort to incorporate such breakthroughs into the functioning of their organizations. To be sure, a few disparate elements throughout the intelligence bureaucracy—whether individual analysts, instructors, or managers of sub-organizations—have taken the initiative to weave these emerging insights into their own work.11 But there is no evidence yet of any larger organizational decision to incorporate the training, talent management, and analytic benefits of such forecasting breakthroughs into the fabric of the intelligence bureaucracy. No champion for broadly institutionalizing such reform has yet publicly emerged, either within the intelligence community or among the oversight bodies in the legislative branch.
‘People Are at Its Core’
Technology will play a key role in any process to reform intelligence forecasting, but people are at its core. Unfortunately, many private- and public-sector decision makers today over-fixate on and defer to engineering solutions—whether that means “big data” algorithms or a new “computer simulation.” Government policy often favors hardware over organizational process improvements. It is easier to convince the Department of Defense to buy a new machine than to try a new human organizational process. Institutionalizing new human approaches—even clear successes like those emerging from the IARPA forecasting pilot project—faces higher obstacles than pitching a new automated solution. But, while it plays an important role, technology is not synonymous with innovation, especially when it comes to the complexity of geopolitical forecasting and risk assessment. Technology is merely a tool that enables innovation, which is a fundamentally human process of improvement. Some things, such as judgment under uncertainty, cannot be fully automated.
The special operations forces (SOF) community is one defense organization that has never lost sight of this distinction. Despite having access to some incredible technologies, the SOF community continues to adhere to the first of its “SOF Truths”: “Humans are more important than Hardware.”12 It also is heartening to understand that new national security organizations—such as the Defense Innovation Board, populated by many giants of science and Silicon Valley—have made it clear they will not lose sight of the human element in a milieu of technological solutions.13
With the human element at the core, new methodologies to improve forecasting should be broadly integrated throughout the government. Policymakers at every level must be trained to improve forecasting skills, rather than focusing solely on their subject-matter expertise and hoping foresight will develop organically. Doctrinal publications should be updated to include training techniques for improving forecasting. Organizations such as the Navy Staff and the Joint Staff must institutionalize these techniques in their everyday functioning. New structured insights from the IARPA GJP pilot should supplement existing techniques, such as wargaming and scenario planning, in helping policymakers better understand risk and trend lines.
Intelligence analytic training also must change by tracking forecasting accuracy in a structured but realistic environment. Such tracking should not be punitive, but rather oriented toward individual and organizational forecasting improvement. Training must focus not just on adherence to an analytical style and methodological standard, but also on better outcomes through an updated psychological understanding of the error balancing and active open-mindedness required for maximally accurate forecasting. But even more than classroom training, leaders must demand realistic, continuous forecasting practice, which is the only way to create the tight feedback loop necessary for systematic improvement. Such a regimen would lead to more effective individual analysts, but even more important, to improved organizational outcomes as intelligence organizations are able to provide policymakers with more accurate understandings of the operational environment and risk assessments.
While mankind is unlikely ever to cast aside all of the distorting filters that affect our perceptions, recent breakthroughs have shown that we can systematically improve our ability to forecast threats and better prepare to defend against them.
1. Welton Chang and Philip E. Tetlock, “Rethinking the training of intelligence analysts,” Intelligence and National Security, January 2016, 1–18.
2. Ibid.
3. Philip E. Tetlock and Dan Gardner, Superforecasting: The Art and Science of Prediction (New York: Crown, 2015), 53–57.
4. B. Fischhoff and C. Chauvin (eds.), Intelligence Analysis: Behavioral and Social Scientific Foundations (Washington, DC: National Academies Press, 2011).
5. Barbara Mellers et al., “The Psychology of Intelligence Analysis: Drivers of Prediction Accuracy in World Politics,” Journal of Experimental Psychology: Applied, vol. 21, no. 1, 1–14.
6. Steven Rieber, “Aggregative Contingent Estimation (ACE) Fact Sheet,” www.iarpa.gov/images/files/programs/ace/01-ACE.pdf.
7. Tetlock and Gardner, Superforecasting, 72–74. David Ignatius, “More chatter than needed,” The Washington Post, 1 November 2013.
8. Tetlock and Gardner, Superforecasting, 17–18, 89–91. Ignatius, “More chatter than needed.”
9. Philip Tetlock, Expert Political Judgment: How Good Is It? How Can We Know? (Princeton, NJ: Princeton University Press, 2005). Barbara Mellers et al., “Identifying and Cultivating Superforecasters as a Method of Improving Probabilistic Predictions,” Perspectives on Psychological Science, vol. 10, no. 3, 267–281.
10. Rieber, “ACE Fact Sheet.” Gregory Treverton, “Strategic Intelligence: A View from the National Intelligence Council,” Center for Strategic and International Studies, 4 March 2016, http://csis.org/event/strategic-intelligence-view-national-intelligence-council-nic. Cass R. Sunstein, “Prophets, Psychics and Phools: The Year in Behavioral Science,” Bloomberg View, 14 December 2015.
11. Treverton, “Strategic Intelligence.”
12. U.S. Special Operations Command, “SOF Truths,” www.soc.mil/USASOCHQ/SOFTruths.html.
13. Interview with Defense Innovation Board staffer, 27 July 2016.