While attending a recent Air Force wargaming exercise, I asked experts their opinion on the future of wargaming in the age of artificial intelligence (AI). The response I often got was “war is a human endeavor.” Emphasizing the importance of human intuition in modern warfare, they seemed to bristle at the idea that AI might play a larger role in wargames as the technology advances. Computers, they posited, could never replicate the human ability to adapt to dynamic battlespaces in the real world and should therefore not replace human input in wargames. AI, however, is ideally suited to solving complex optimization problems—exactly the type of problems wargaming presents.
To illustrate how AI might be helpful, it is worth explaining in broad strokes how wargaming is conducted, using a game in which I participated as an example. The gaming team consisted of about 300 military and civilian individuals who were divided into three groups: blue, red, and adjudication. The blue team acted as the United States and its allies, the red team as the adversary, and the adjudication team as the rule keepers. The wargame lasted 14 days. Each day, the blue and red teams submitted to the adjudication team “moves” to achieve their objectives. The adjudication team determined the outcomes of each move based on the feasibility of each attack, defensive move, or movement.
As a member of the maritime adjudication team, I contributed to decisions on the outcomes of any engagements involving ships or submarines. We answered questions such as: Would ship A actually be within range to strike ship B? or How would ship B’s defenses protect it from ship A’s strikes? Ideally, scientific testing models determined the outcome of engagements by considering each unit’s weapon systems and defenses, but the models were not always sufficient. Often, some human judgment and intuition were required to adjudicate engagements.
For example, it was not always clear how the red or blue team’s electronic warfare capabilities would affect a missile strike. The adjudication team would often adjust the probability of an effective strike by a percentage based on variables not included in established models. Thousands of these human inputs were injected over the course of the game. This begs the question: If each human input were off by even as little as 10–20 percent, how accurate or useful would the wargame’s battle simulation be after all those errors were compounded?
Flawed Human Judgment
Humans are notoriously inconsistent decision-makers. Consider a 1981 study in which 208 federal judges were asked to determine sentences concerning the same 16 cases. The cases were presented in a simplified manner, with only the necessary information about the crimes and defendants given to the judges. The average difference between any two sentences for identical crimes was 3.5 years. Considering that the average sentence was 7 years, this is a significant degree of deviation.1
Behavioral scientists such as Daniel Kahneman refer to this error in judgment as noise. He states that noise in sentencing can occur because of any number of factors, such as a judge’s hunger or fatigue, or even the weather.2 Wargame adjudicators are certainly not immune to noise errors in their decision-making. Over a two-week game, it is reasonable to assume the adjudication team made inconsistent calls concerning dozens of seemingly identical engagements. Combining this assumption with the possibility that an adjudication team composed mainly of U.S. service members may be biased toward favorable U.S. outcomes casts serious doubt on the validity of wargame forecasts.
While humans should always be included in the wargaming process, the paramount goal of wargaming is, arguably, to provide national leaders a plausible and useful model on which to base defense policy decisions. Marine Corps University’s Horner Chair of Military Theory, Jim Lacey, stated that “every strategic decision that the DoD will ever consider can and should be shaped through wargaming.”3 The current simulations may not be conclusive enough to meaningfully inform policy decisions at the highest level. Considering the butterfly effect that thousands of imprecise human inputs have on a wargame, running a single instance of a game produces one of likely millions of possible outcomes. Because a model is plausible does not make it useful.
The AI Advantage and Limits
Imagine attempting to predict the outcome of a chess match between two world-class players by studying the tactics of both chess masters and then simulating a single game. What effect might a mis-moved pawn early in the game have on the endgame scenario? If you have ever played chess, you know a single misstep can have massive repercussions down the line. Even if a team simulates this chess match 20 times, it is unlikely it could predict the outcome. A better way to predict the winner might be to train AI on data from every game the chess masters ever played, then let the AI play out millions of possible match scenarios. One could then determine the most likely outcome based on the aggregated data.
This is exactly the approach the U.S. military should take in wargaming. Ultimately, wargaming presents an optimization problem—how should global forces behave to best achieve U.S. objectives? The modern battlespace contains countless variables, making the problem more complex than humans alone can likely solve. Returning to the chess metaphor, AI has proven superior to humans at the game for more than two decades, since IBM’s Deep Blue AI defeated grandmaster Garry Kasparov in 1997. Many had argued that humans were better suited to predict the best move in a situation that had never been encountered—a sentiment echoed by many in the wargaming community. This assumption was again proven wrong in 2017 when Google’s AlphaGo bested the world’s top player at Go, an ancient game even more complex than chess.
While AI is growing more powerful by the day, it does have its limitations. Its ability to simulate warfare would only be as good as the data from which it learned. If the learning set is flawed, so will be the model, and every past wargame is flawed in some way. But AI could remove the noise from human adjudicators’ decision-making and produce more consistent and reliable models. It could enable leaders to analyze a wargame in retrospect and test how changing certain variables affects the outcome, answering questions such as: How would an increase in naval forces in this particular region have affected our forces’ ability to close vital supply chains? or How would poor weather have affected the enemy’s air defenses for this particular strike? Answering these questions without AI would require rerunning an entire game, an unrealistic and impractical approach.
AI powerful enough to create a useful wargaming product would likely take years and considerable resources to produce, but it is certainly possible. It also would likely save billions of dollars through force optimization and identification of superfluous assets. Once produced, the software would greatly cut the costs of future research by reducing the personnel and resources needed to host games.
If wargaming data is, even in part, going to determine the U.S. military’s force design, it is in the military’s interest to ensure the forecasts are both plausible and useful, and AI should be used to maximum effect to achieve this goal. Human intuition will always play a part on the battlefield, but wargames should be data-driven, iterative, and free from the noise of human decision-making to produce the most useful models. AI excels at this and offers the military a tool to revolutionize the way it prepares for future conflicts.
1. Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, “Bias Is a Big Problem. But So Is ‘Noise,’” The New York Times, 15 May 2021.
2. Caroline Criado Perez, “Noise by Daniel Kahneman, Olivier Sibony and Cass Sunstein: Review—The Price of Poor Judgment,” The Guardian, 3 June 2021.
3. Jim Lacey, “Shall We Play a Game?” Army War Room, 31 July 2020.