GPT-5.5 Matches Mythos Preview in New Cybersecurity Tests

Summary: New research from the AI Security Institute (AISI) suggests that OpenAI's GPT-5.5 achieved performance comparable to Anthropic's Mythos Preview in complex cybersecurity tests.

GPT-5.5 Does Not Outperform Mythos: Researchers Question Cybersecurity Hype
By MSB

At a time when artificial intelligence models compete to position themselves as key tools in cybersecurity, new research casts doubt on one of the industry's most repeated narratives: that the most recent models are significantly leaving behind their predecessors.

According to an analysis published by Ars Technica, the GPT-5.5 model, presented as a major advance in offensive and defensive security capabilities, does not show a clear advantage over Mythos, another model widely promoted for its focus on cybersecurity.

Results Cool Expectations

Researchers compared both models on real-world security tasks, including:

  • Vulnerability analysis
  • Exploit generation
  • Malicious code interpretation
  • Insecure configuration evaluation

The result was surprisingly balanced. In multiple tests, GPT-5.5 did not consistently outperform Mythos, and in some scenarios offered virtually identical results.

This challenges the idea that every new iteration of models automatically assumes a significant leap in practical capabilities, especially in such a critical field as cybersecurity.

The Problem of AI Hype

The report also points to an increasingly visible phenomenon: the marketing surrounding artificial intelligence is inflating expectations that do not always correspond to actual performance improvements.

In the case of Mythos, its positioning as a “cybersecurity-specialized” model generated the perception that it would far surpass generalist models. However, GPT-5.5 demonstrates that a broader model can compete at the same level without being specifically designed for that niche.

This type of conclusion reinforces a key takeaway for security teams: the choice of AI tools should not be based solely on promises or branding, but on real tests and concrete use cases.

Implications for the Sector

For QA, pentesting, and offensive security professionals, these results have several readings:

  • The difference between models may be smaller than expected
  • Practical validation is more important than theoretical specifications
  • Generalist models remain highly competitive

Furthermore, it opens the debate on whether the future of AI in cybersecurity lies with highly specialized models or with generalist systems with increasingly refined capabilities.

Beyond Comparison

Rather than declaring a “winner,” the study underscores something more relevant: the actual performance of models depends heavily on the context, input data, and how they are used.

In other words, the tool matters, but the use of the tool remains the deciding factor.

In an ecosystem where artificial intelligence is advancing at great speed, this type of analysis provides a necessary dose of realism against market enthusiasm.

Key facts

  • GPT-5.5 achieved 71.4% on AISI Expert tasks, surpassing Mythos Preview's 68.6%.
  • The improvement in cybersecurity is attributed to the general increase in AI autonomy and reasoning.
  • GPT-5.5 achieved 3 out of 10 in the TLO test, compared to 2 out of 10 for Mythos Preview.

Why it matters

These results indicate that concern over one model's superiority in cybersecurity might be part of 'fear-based marketing'. The parity suggests that real improvement is becoming more general and less dependent on a single technology or model.

Embedded content for: GPT-5.5 Matches Mythos Preview in New Cybersecurity Tests