The Perils of Undisclosed AI Development

Faulty artificial intelligence

In recent years, artificial intelligence has been a prominent subject of public discussion. Each new AI model showcases remarkable innovations that quickly eclipse previous versions released mere weeks prior. Experts, developers, and CEOs of AI companies are making bold predictions about future trajectories, from the automation of arduous labor and the extension in human longevity to the potential risks to humanity.

The widespread conversation about AI is partly due to the way these innovations have been publicized, generating exponentially growing revenues for the companies developing these models. However, as AI becomes faster, more capable, and more complex, that public conversation could rapidly shift behind closed doors. AI companies are increasingly deploying AI models within their own organizations, and it’s probable they will soon find it beneficial to reserve their most powerful future models for internal use. Yet, these seemingly innocent decisions could pose a serious threat to society at large, as elaborated below.

Most leading AI companies have publicly stated their aim to develop AI models as proficient as humans across all cognitive tasks, which could create trillions of dollars in economic value. Given the current belief in a swift progression towards artificial general intelligence (AGI), the potential strategic advantage of highly advanced models might soon prompt companies to leverage their models confidentially and internally to accelerate technical progress—while providing minimal indication of advancement to competitors and the broader outside world.

Current AI systems already frequently exhibit unexpected, unintended, and undesirable behaviors in experimentally simulated environments, for example, threatening to users, or . However, should leading developers begin to keep their advancements private, society would lose any opportunity, even a limited one, to publicly learn about and assess the benefits and drawbacks, the risk and security profiles, and the overall direction of this foundational technology. Once advanced future AI systems are deployed and used, perhaps exclusively so, behind closed doors, unforeseen dangers to society could emerge and evolve without oversight or preliminary warnings—a threat we are able to and must prevent.

Leading laboratories are already increasingly utilizing AI systems to accelerate their own research and development (R&D) pipelines, by designing new algorithms, proposing entirely new architectures, or optimizing code. Google, for example, reported that 50% of their code was now written by AI. As highlighted in , advanced AI systems could eventually be used to iteratively improve their own successors, potentially creating a powerful “virtuous cycle” of increasingly capable models. This outcome would be excellent news for AI companies aiming to quickly reach artificial general intelligence, or even , ahead of competitors—but only if they exploit their strategic advantage away from public scrutiny.

At first glance, all of this might sound harmless: what danger could an unreleased AI system possibly present?

The problem is two-fold: first, as advanced AI systems become progressively more useful internally for building better AI, there may be strong competitive and economic incentives, even more so than today, to prioritize speed and competitive advantage over caution. This race dynamic carries risks, especially if increasingly advanced AI systems start being used by company staff and deployed in security-critical areas such as AI R&D, potentially autonomously to reduce friction, embedding potential failure points before anyone can fully grasp the AI systems’ behavior.

Second, existing assessments and interventions primarily focus on publicly available AI systems. For internally deployed AI systems, very little, if any, information is accessible about who has privileged access to them or what they are used for. More precisely, scant information is made available about their capabilities, whether they behave in undesirable ways; whether they are under appropriate control with oversight mechanisms and safeguards; whether they can be misused by those who have access to them, or their overall risk profiles. Nor are there sufficient level-headed and detailed requirements to ensure that these AI systems are rigorously tested and do not pose a cascading threat to society before they are put into use.

If we do not mandate tech companies to provide sufficiently detailed information about how they test, control, and internally use new AI models, governments cannot prepare for AI systems that could eventually possess nation-state capabilities. Meanwhile, threats that develop confidentially could spill over into society without prior warning or the ability to intervene. To be clear, even today, we cannot trust current AI systems to reliably behave as intended, whether they are externally or internally deployed. However, we still have time to act.

There are straightforward measures that can be implemented today. The scope of AI companies’ voluntary frontier AI safety policies should be explicitly expanded to cover high-stakes internal deployment and use, such as for accelerating AI R&D. As part of this, internal deployment should be treated with the same diligence as external deployment, and rigorous assessments and evaluations to identify dangerous capabilities, the establishment of clear risk profiles, and required control or guardrail mechanisms prior to usage should be encouraged.

Government agencies responsible for national preparedness should have proactive visibility into the internal deployment and use of highly advanced AI systems and receive all necessary national-security-critical information. This could include, for example, information about who has access to these AI systems and under what conditions, what these AI systems are used for, what oversight is being applied to them, and what could happen if this oversight fails, in order to ensure that economic and intellectual property interests are balanced with legitimate national security interests.

AI companies and governments should collaboratively take the lead on adopting these straightforward best practices to ensure trustworthy innovation and protection of the public.