Every three months, participants on the Metaculus forecasting platform endeavor to predict future events for a prize pool of approximately $5,000. Metaculus, a platform dedicated to forecasting, poses questions of significant geopolitical relevance, such as “Will Thailand experience a military coup before September 2025?” and “Will Israel strike the Iranian military again before September 2025?”
Forecasters calculate the probabilities of these events occurring—a more insightful estimation than a simple “yes” or “no”—often many weeks to months in advance, frequently demonstrating remarkable precision. For instance, Metaculus users accurately predicted the date of the Russian invasion of Ukraine two weeks prior and estimated a 90 percent likelihood of Roe v. Wade being overturned nearly two months before it transpired.
Yet, one of the top 10 performers in the competition, whose victors were declared on Wednesday, surprised even the forecasters themselves: an artificial intelligence. Toby Shevlane, CEO of Mantic, the recently unveiled UK-based startup that engineered the AI, remarked, “It’s actually kind of mind blowing.” When the contest began in June, participants anticipated that the leading bot would achieve a score equivalent to 40% of the average of the top human performers. Mantic, however, surpassed 80%.
“Forecasting—it’s everywhere, right?” observed Nathan Manzotti, who has experience in AI and data analytics for the Department of Defense and General Services Administration, alongside about a half-dozen other U.S. government agencies. “Select any government agency, and they undoubtedly engage in some form of forecasting.”
Forecasters assist institutions in anticipating the future, as explained by Anthony Vassalo, co-director of the Forecasting Initiative at RAND, a U.S. government think tank. Moreover, it helps them influence it. Predicting geopolitical developments weeks or months ahead aids in “stopping surprise” and “assisting decision-makers in being able to make decisions,” Vassalo stated. Forecasters update their predictions based on policies implemented by lawmakers, allowing them to project how a hypothetical policy intervention might alter future outcomes. If decision-makers find themselves on an undesirable trajectory, forecasters can help them “change the scenario they’re in,” Vassalo added.
Nevertheless, forecasting broad geopolitical matters is notoriously challenging. Predictions from leading forecasters can consume days for a single question. For entities like RAND, which track multiple subjects across numerous geopolitical regions, “it would require months for human forecasters to produce an initial forecast on all those questions, let alone update them regularly,” Vassalo commented.
Machine learning has long proven valuable in fields characterized by abundant, well-structured data, such as weather prediction or quantitative fund trading. When forecasting geopolitics or technological advancements, “you will encounter many complex, interdependent factors where human judgment can be both more accessible and affordable” in making predictions, according to Deger Turan, CEO of Metaculus.
Large language models process the same intricate information as human forecasters and are capable of simulating this human judgment. They also improve in a manner similar to humans: by generating predictions on numerous questions, observing the outcomes, and refining their forecasting methodologies based on the results—though on a scale far grander than humans can manage.
“Our primary insight was that predicting the future tends to be a verifiable problem, because that’s how humans learn, right?” remarked Ben Turtel, CEO of LightningRod, a company that develops AI for forecasting and has achieved competitive placements in Metaculus AI tournaments. The company trained a recent model on a vast number of forecasting questions.
The rigorous training received by AIs is now reflected in their rankings. In June, the top-ranked bot, constructed by Metaculus atop OpenAI’s o1 reasoning model, participated in the cup. This time, Mantic secured eighth place out of 549 contestants—marking the first instance an AI has placed in the top 10 of this competition series.
This outcome should be interpreted with some reservation, according to Ben Wilson, an engineer at Metaculus who oversees comparisons between AIs and humans in forecasting challenges. The contest comprised a relatively small sample of 60 questions. Moreover, a majority of the 600 contestants are amateurs, some of whom only submitted predictions for a handful of questions in the tournament, resulting in lower scores.
Finally, the machines possess an inherent advantage. Participants earn points not only for accuracy but also for “coverage”—which encompasses how early they make predictions, the quantity of questions they predict on, and the frequency with which they update their estimates. An AI that is less accurate than its human counterparts can still achieve a high ranking by constantly revising its estimates in response to unfolding news, a practice unfeasible for humans.
For Vassalo, the AIs’ inherent advantage addresses his most significant outstanding challenge: obtaining high-quality forecasts across all the questions for which he requires predictions. “I don’t actually need it to reach the level of a superforecaster,” he stated, using the term for top-tier forecasters. “I need it to perform as well as the crowd.”
This objective is more challenging than it sounds: the Metaculus Community prediction, which aggregates all users’ forecasts on every question, stands as one of the platform’s most consistent performers. If treated as an individual, it would rank fourth on the site—such is the collective wisdom of the crowd. In the Quarterly Cup, Mantic lagged five places behind the Community Prediction.
A dependable AI forecaster could simultaneously track hundreds of questions, thereby enabling Vassalo to deploy top human forecasters exclusively for those questions the AI deems worthy of closer scrutiny.
“The fundamental aspect of forecasting, or predictive analytics, is that it serves as decision support,” Manzotti concluded. “Many leaders will discard the data if they have a gut feeling leaning in a different direction.” This is a problem that AI cannot resolve.