A Proposed Solution to AI’s Existential Risk

Technological advancements can excite us, political issues can infuriate us, and conflicts can mobilize us. However, when confronted with the risk of human extinction posed by Artificial Intelligence (AI), we have surprisingly remained passive. This may be partly due to the lack of a clear solution. I aim to challenge that notion.

AI’s capabilities are continuously improving. Since the release of [redacted] two years ago, hundreds of billions of dollars have been invested in AI. These combined efforts are likely to lead to Artificial General Intelligence (AGI), where machines possess human-like cognitive abilities, potentially [redacted].

Hundreds of AI scientists warn that we could lose control over AI once it becomes too powerful, potentially leading to human extinction. So, what can we do?

The existential risk of AI has often been presented as an extremely complex issue. A [redacted], for instance, described the development of safe human-level AI as a “super wicked problem.” This perception of difficulty stemmed largely from the proposed solution of AI alignment, which involves making superhuman AI act in accordance with human values. However, AI alignment was a problematic solution from the outset.

Firstly, scientific progress in alignment has been considerably slower than the advancement of AI itself. Secondly, the philosophical question of which values to align a superintelligence with is incredibly complex. Thirdly, it is not at all clear that alignment, even if successful, would effectively address AI’s existential risk. The presence of one friendly AI does not necessarily prevent the emergence of other unfriendly ones.

In light of these issues, many have urged technology companies to refrain from building any AI that humanity could lose control over. Some have gone further; activist groups such as PauseAI have indeed called for an international treaty to pause development globally.

However, this approach is not deemed politically palatable by many, as it may still take a considerable amount of time before the missing pieces to AGI are in place. And why should we pause now, when this technology has the potential to do much good? Yann Lecun, AI chief at Meta and a prominent existential risk skeptic, compares the existential risk debate to “worrying about turbojet safety in 1920.”

On the other hand, technological advancements can happen rapidly. If we experience another breakthrough like the [redacted], a 2017 innovation that helped launch modern Large Language Models, we might achieve AGI within a few months of training. This underscores the need for a regulatory framework to be established before such a breakthrough occurs.

Fortunately, [redacted], Turing Award winner Yoshua Bengio, and many others have provided a piece of the solution. In a policy paper published in Science earlier this year, they proposed “if-then commitments”: commitments to be activated if and when red-line capabilities are found in frontier AI systems.

Building on their work, we at the nonprofit Existential Risk Observatory propose a Conditional AI Safety Treaty. Signatory countries of this treaty, which should include at least the U.S. and China, would agree that once we get too close to losing control, they will halt any potentially unsafe training within their borders. Once the most powerful nations have signed this treaty, it is in their interest to verify each others’ compliance and ensure that such AI is not developed elsewhere either.

One outstanding question is at what point AI capabilities become too close to losing control. We propose to delegate this question to the AI Safety Institutes set up in the [redacted], [redacted], [redacted], and [redacted]. These institutes possess specialized model evaluation expertise, which can be further developed to answer this crucial question. Additionally, these institutes are public, making them independent from the primarily private AI development labs. The question of how close is too close to losing control will remain challenging, but someone will need to address it, and the AI Safety Institutes are best positioned to do so.

Under the Conditional AI Safety Treaty, we can still reap the benefits of AI to a significant extent. All current AI systems are far below the loss of control level and will therefore remain unaffected. Narrow AIs in the future that are suitable for a single task, such as climate modeling or discovering new medicines, will also be unaffected. Even more general AIs can still be developed if labs can demonstrate to a regulator that their model poses a loss of control risk less than, say, 0.002% per year (the threshold we accept for nuclear reactors). Other AI thinkers, such as MIT professor [redacted], Conjecture CEO [redacted], and ControlAI director [redacted], are exploring similar approaches.

Fortunately, the existential risks posed by AI are recognized by many individuals close to President-elect Donald Trump. His daughter Ivanka has expressed the urgency of the problem. Elon Musk, a critical Trump supporter, has been sounding the alarm about the civilizational risks for many years and recently supported California’s legislative push to safety-test AI. Even the right-wing Tucker Carlson offered common-sense commentary when he stated: “So I don’t know why we’re sitting back and allowing this to happen, if we really believe it will extinguish the human race or enslave the human race. Like, how can that be good?” For his part, Trump has also spoken about the risks posed by AI.

The Conditional AI Safety Treaty could offer a solution to AI’s existential risk without unnecessarily hindering AI development at present. Convincing China and other countries to accept and enforce the treaty will undoubtedly pose a significant geopolitical challenge, but perhaps a Trump administration is precisely what is needed to overcome it.

A solution to one of the most difficult problems we face—the existential risk of AI—does exist. It is up to us whether we make it happen or continue down the path toward potential human extinction.