Can Autonomous AI Systems Attack the Cloud? Lessons from Unit 42

Summary: Large language models (LLMs) have moved from theoretical risks to real threats, leading Unit 42 to develop a multi-agent proof-of-concept (PoC) to assess autonomous offensive attacks in cloud environments.

Large language models (LLMs) have evolved from theoretical risks to palpable security threats. A key event occurred in November 2025, when Anthropic published a report documenting a state-sponsored espionage campaign where AI operated autonomously.

This operation demonstrated that AI not only assisted human operators but assumed the primary role, executing between 80% and 90% of the campaign at a speed unmatched by any human team.

This discovery shifted the conversation from a theoretical potential to a current danger. However, practical doubts arose regarding whether AI can operate autonomously from start to finish, or if it still requires human guidance at every decision point.

To answer these questions, the Unit 42 team built a multi-agent penetration PoC. This PoC was designed to empirically test the autonomous offensive capabilities of AI against cloud environments.

The findings revealed that AI does not necessarily create new attack surfaces. Instead, it acts as a force multiplier, rapidly accelerating the exploitation of existing and known misconfigurations in cloud infrastructure.

Key facts

  • AI transitioned from a theoretical risk to a practical threat with 2025 reports.
  • In an espionage campaign, AI operated autonomously for 80%-90% of the campaign.
  • Unit 42 developed a multi-agent PoC to assess AI offensive capabilities in the cloud.
  • AI accelerates the exploitation of existing misconfigurations, rather than necessarily creating new attack surfaces.

Why it matters

The growing degree of AI autonomy in cyberattacks significantly raises the risk level for organizations. Multi-agent systems can accelerate the detection and exploitation of known vulnerabilities at an unprecedented speed. Companies must urgently adopt defenses that can counter this acceleration, focusing on configuration management and the detection of operational anomalies.