Rowhammer on NVIDIA GPUs: when graphics memory becomes an entry point into the system
Introduction: from a theoretical flaw to a real threat
For years, the Rowhammer attack was considered a serious but relatively limited problem, primarily affecting traditional RAM (DDR). However, recent research has shown that this vulnerability has evolved and now affects modern graphics cards as well.
The discovery of new variants like GDDRHammer and GeForge marks a turning point: for the first time, a hardware-level attack on memory in a GPU can scale to compromise the entire operating system.
This advancement changes the perception of security in environments where GPUs are fundamental, especially for artificial intelligence, cloud computing, and high-performance systems.
What is Rowhammer and how does it work?
Rowhammer is a physical vulnerability in DRAM memory. It consists of repeatedly accessing one row of memory to induce interference in adjacent rows, causing involuntary bit flips (bit flips).
This behavior allows modifying data without direct access, breaking the memory isolation barriers. In simple terms, an attacker can alter information they should not be able to touch.
Although this issue has been studied since 2014, what is concerning is that it hasn't disappeared. In fact, with advancing technology, memory cells are becoming smaller and more sensitive.
The leap to GPUs: from GPUHammer to GDDRHammer
The first significant advance in this field was GPUHammer, which demonstrated that graphics cards with GDDR6 memory were also vulnerable. This attack managed to alter bits in the memory of an NVIDIA GPU and directly affect artificial intelligence model behavior.
In real-world tests, a single bit flip was enough to drastically reduce the accuracy of an AI model, showcasing the potential impact of the attack.
But what is most worrying came after.
New attacks: GDDRHammer and GeForge
Recent research presents two much more advanced variants:
- GDDRHammer
- GeForge
These attacks not only cause errors in memory but also allow manipulation of critical system structures.
One of the gravest findings is that an attacker can modify GPU page tables to redirect accesses toward CPU memory.
This implies something unprecedented:
the GPU can become a direct bridge into the main system memory
Impact: full control of the system
The impact of these attacks is extremely high. Under the right conditions, an attacker can:
- Obtain read and write access to memory
- Elevation of privileges to admin or root level
- Manipulate operating system processes
- Alter or corrupt critical data
Recent research shows that these attacks can generate hundreds or even thousands of bit flips in real NVIDIA GPUs like the RTX 3060 or RTX A6000.
This is not a theoretical flaw: it is exploitable in real-world environments.
A particularly critical issue in shared environments
While these attacks require executing code on the system, their danger increases in shared environments such as:
- Cloud servers
- Multiuser AI infrastructure
- High Performance Computing (HPC) systems
In these scenarios, multiple users share the same GPU, allowing an attacker to affect other users without direct access to their data.
This breaks one of the fundamental principles of security: process isolation.
Limitations of the attack
Despite its severity, the attack has certain limitations:
- Requires local code execution
- It is not trivial to exploit
- Dependent on specific system configurations
This makes it more relevant in targeted attacks than mass attacks for now.
However, this does not reduce its importance. Historically, many complex vulnerabilities have simplified over time.
Current mitigations
There are some measures to reduce the risk:
1. Enable ECC (Error Correction Code)
Detects and corrects bit errors in memory but may affect performance.
2. Enable IOMMU
Limits GPU access to system memory, preventing privilege escalations.
3. Workload isolation
Avoid non-trusted users sharing the same GPU.
4. Use more recent hardware
Some newer architectures (like GDDR6X or GDDR7) were not affected in these studies.
Community reaction
In technical forums and communities like Hacker News, reactions have been mixed. Some consider the attack too complex to be practical in domestic environments, while others warn that it poses a significant risk in critical infrastructures.
This perception difference reflects an important reality:
the impact depends entirely on the context in which the GPU is used.
The underlying issue: modern hardware security
The Rowhammer case on GPUs highlights a deeper problem:
Security no longer solely relies on software.
Now, physical vulnerabilities in hardware can be exploited via software, creating a new category of hybrid attacks.
Furthermore, GPUs have gone from being just graphical components to becoming key pieces in:
- Artificial intelligence
- Cryptography
- Massive data processing
This amplifies the impact of any vulnerability.
Conclusion
The discovery of attacks like GDDRHammer and GeForge redefine the landscape of cybersecurity.
For the first time, a vulnerability in the memory of a GPU can scale to completely compromise the system. This breaks the traditional model where the GPU was seen as an isolated component.
The lesson is clear:
the attack surface is no longer limited to software or the CPU.
In the era of accelerated computing, security must be considered at every layer of the system, including more specialized hardware.