CTI-REALM: A new benchmark for end-to-end detection rule generation with AI agents

ARCHIVE This story is marked as archive content due to its age and may not reflect the current state of events.

Summary: Microsoft has released CTI-REALM, an open-source benchmark for evaluating AI agents in generating detection rules from threat intelligence reports. It focuses on operationalizing threat insights into actionable detections.

CTI-REALM is Microsoft’s open-source benchmark designed to evaluate AI agents in generating detection rules from threat intelligence reports. Unlike existing benchmarks that test parametric knowledge such as classifying techniques, CTI-REALM focuses on the end-to-end workflow of turning narrative CTI into operational detections. It uses 37 curated CTI reports and evaluates models across Linux endpoints, Azure Kubernetes Service (AKS), and Azure cloud infrastructure.

Key facts

Microsoft has released CTI-REALM as an open-source benchmark for evaluating AI agents in generating detection rules from threat intelligence reports.
CTI-REALM evaluates the end-to-end workflow, including reading CTI reports, exploring telemetry, writing KQL queries, and producing Sigma rules.
The benchmark uses 37 curated CTI reports across Linux endpoints, Azure Kubernetes Service (AKS), and Azure cloud infrastructure.
Results from evaluating 16 frontier model configurations on CTI-REALM-50 show that Anthropic models lead across the board.

Why it matters

CTI-REALM matters for businesses because it provides a detailed evaluation framework that measures the operationalization of AI in security workflows, offering insights into where human review and guardrails are needed. This benchmark supports safer adoption by helping teams assess model performance before deploying them in production environments.

Key metrics

Model Performance: Anthropic models lead with Claude occupying top three positions (0.587–0.637)

Structured details

Industry: cybersecurity
Source type: media
Claim status: confirmed
Verification: claimed_by_source
Entities: CTI-REALM, Microsoft Agent 365, Microsoft Security Copilot

Source: https://www.microsoft.com/en-us/security/blog/2026/03/20/cti-realm-a-new-benchmark-for-end-to-end-detection-rule-generation-with-ai-agents/

Published on 03/20/2026