Large language models (LLMs) have evolved from theoretical risks to palpable security threats. A key event occurred in November 2025, when Anthropic published a report documenting a state-sponsored espionage campaign where AI operated autonomously.
This operation demonstrated that AI not only assisted human operators but assumed the primary role, executing between 80% and 90% of the campaign at a speed unmatched by any human team.
This discovery shifted the conversation from a theoretical potential to a current danger. However, practical doubts arose regarding whether AI can operate autonomously from start to finish, or if it still requires human guidance at every decision point.
To answer these questions, the Unit 42 team built a multi-agent penetration PoC. This PoC was designed to empirically test the autonomous offensive capabilities of AI against cloud environments.
The findings revealed that AI does not necessarily create new attack surfaces. Instead, it acts as a force multiplier, rapidly accelerating the exploitation of existing and known misconfigurations in cloud infrastructure.