Unit 42 of Palo Alto Networks published an analysis that examines Amazon Bedrock’s multi-agent system from a red team perspective. The researchers demonstrate how, under certain conditions, an adversary could progress through a chain of attacks by determining the operational mode (Supervisor or Supervisor with Routing), discovering collaborating agents, and executing malicious actions. These attacks include disclosing instructions and schemas for toolsets, as well as invoking them with inputs provided by the attacker.
No vulnerabilities were identified in Amazon Bedrock; however, tests showed that the integrated protection barriers (Guardrails) of Bedrock stopped the attacks when configured correctly. Nevertheless, these findings reaffirm the need to protect systems that rely on large language models (LLM), as they cannot distinguish between developer-defined instructions and adversarial input.
The researchers tested with their own Bedrock agents, limiting themselves to agent logic and application integrations.