Row Hammer Solution | ARMOR: A Run-time Memory hot-row detectoR

What is ARMOR?

ARMOR is a hardware-solution to prevent Row Hammer Errors in DRAMs, designed and developed in the School of Computer Science at The University of Manchester. The main challenge to mitigate the Row-Hammer effect is to monitor the number of activations for each row in the DRAM, which imposes a significant storage overhead to the memory system. ARMOR monitors the activation stream at the memory interface level and detects which specific rows (i.e. hot rows) are at risk of being “hammered” at run-time. ARMOR is capable of detecting all the possible hot-rows in a system with a minimal storage overhead (e.g. 800 Bytes to protect 4 GB of DRAM).

Why ARMOR is a Promising Solution?

It is capable of detecting all the possible Row Hammer errors with a high level of confidence.

It provides precise information about the hammered rows (addresses) and the number of activations with a high level of accuracy (e.g. 99.99%).

It does not need to know about the logical to physical mapping of DRAMs in order to mitigate Row Hammer error (ARMOR Cache Solution).

It is scalable according to the size of memory.

It is technology independent and can easily support future device technologies.

ARMOR - Overview of Architecture

How to Mitigate Row Hammer Error using ARMOR?

By gathering information on specific hot-rows ARMOR makes it possible to mitigate the Row Hammer error using various techniques.

Target Row Refresh: Since ARMOR provides the exact address for the hammered rows a memory controller can employ the Target Row Refresh (TRR) command to refresh the victim rows. TRR is supported by the latest DRAM devices (DDR4). However, memory manufacturers still need to reveal the logical to physical mapping for older DRAM families (e.g. DDR3).

ARMOR Cache Solution: Since the potential hot-rows can be detected by ARMOR before reaching the hammering threshold it is possible to cache the hammered addresses outside of the DRAM. Thus, further activations to the frequently accessed row are serviced outside the DRAM module which as a result prevents the accessed row from being hammered. This also provides an opportunity to improve the performance of memory systems (since the cached row can be accessed more quickly). Since these hot-rows only need to be cached for a short period of time (Refresh Interval - 64ms) a small buffer would be sufficient to implement this solution.

Can ARMOR Prevent the Double-Sided Hammering?

ARMOR is a flexible technique that can be adjusted to support different scenarios. ARMOR has a configurable Activation Threshold which defines the minimum number of activations required to induce a Row Hammer error. Double-sided hammering causes that the DRAM cells to lose their data more quickly than single-sided hammering. Thus, in this case a lower number of activations are required to induce the Row Hammer error. ARMOR can prevent the double-sided hammering error by simply reducing its Activation Threshold to a desired value (let's say half of the value required to prevent single-sided hammering). Although, this solution will increase the storage overhead required by ARMOR the overall storage overhead is still negligible (e.g. 1.6KB to protect a 4 GB DRAM).

There is also another solution to prevent double-sided hammering error using ARMOR. Considering that ARMOR knows which rows are undergoing the hammering phenomenon, having the knowledge of logical to physical mapping of rows (needs to be revealed by memory manufacturers) would be enough to detect the double-sided hammering error. In this situation, ARMOR can sum the activations of the adjacent aggressor rows to decide if the victim needs to be refreshed, this does not effect the storage overhead.

What was the initial Evaluation Methodology for ARMOR?

ARMOR was initially evaluated using USIMM, a detailed DRAM simulator. To investigate the performance of ARMOR, we compared it in scenarios where we can test if all the hammered-rows are identified (ground truth) as well as direct comparison with Kim et al. (i.e. ‘PARA’). The ground truth is calculated by having one counter per row in DRAM to keep track of the exact number of activations within the refresh interval. We monitor all the counters and check whether they have reached the threshold which would corrupt data; ACTth. As soon as one counter reaches ACTth (i.e. hammered rows are detected) we check ARMOR’s hot-row table to see if it contains the detected row and if so what is the predicted number of activations. We also evaluate PARA using a similar methodology. This means that every time that an actual hot-row is detected using the embedded counters, for each row we check if PARA has issued any refreshes for the detected hot rows. If it does we assume that PARA has issued the correct refresh to the victim rows, although due to its nature there is only a 50% probability of this having happened.

What is the Performance Overhead of ARMOR?

We evaluated ARMOR across a wide range of memory intensive workloads (i.e. 48 workloads) from different benchmark suits: PARSEC, SPEC, BIOBENCH, HPC and COMMERCIAL. The following figure presents the performance overhead of ARMOR and PARA (with different probability values) for all the evaluated workloads. ARMOR only imposes the performance overhead (due to extra refreshes of Victim-Rows) if there is a hot-row (Row-Aggressor) in the system. Our experimental results show that non of the evaluated standard workloads suffer from the Row-Hammer error. Thus, as the following figure presents, ARMOR does not degrade the performance at all. On the other hand, since PARA randomly issues the refresh command to different rows in the system, even if there is no row-hammer threat, there is always a performance overhead associated with it. The performance overhead imposed by PARA will increase if the total number of activations increases.

The above results show that the performance overhead of PARA with the reported probability value by Kim et al (i.e. 0.001 and 0.005) degrades the performance of DRAM by around 1%, which is negligible. However, our experimental results show that these probability values are not efficient in protecting a DRAM system against malicious code (presented in the following section). According to the evaluated malicious codes in our experiment a probability value of 0.2 might protect the system employing PARA against such attacks. The following results show the performance overhead for the standard benchmarks when employing higher probability values than what has been reported by Kim et al. These results show that PARA can impose up to 35% performance overhead to the memory system, if protection against malicious code is required.

Can ARMOR Protect the Memory Systems from the Malicious Codes?

To evaluate ARMOR against malicious code we tried to produce different kernels (36 Kernels - Memory Traces) that can generate access patterns likely to cause row hammer. We randomly selected few rows (up to 20 rows), out of the millions available rows in a DRAM, and tried to access them more frequently than other rows (we call them target rows) in the system. We employed three different distribution patterns (Uniform, Gaussian and Poisson) to access target rows while interleaving the random accesses to other rows in the system. In this way the malicious memory accesses are integrated with the random memory accesses to DRAM which makes them less recognizable by the memory controller. Our experimental results show that ARMOR detects all the existing hot-rows (introduced by the malicious accesses) with an activation count accuracy of 99.99% of the number of activations for each hot-row. Also, we investigated the performance of PARA (proposed by Kim et al.) for this malicious codes. The following figure depicts the PARA's performance for different probability values (P) required by this technique. Increasing the P value increases the probability of refreshing random rows. Since ARMOR detected all the hot-rows (zero miss-rate) it is excluded from the following figure.

The above figure suggests that for the probability values evaluated by Kim et al (i.e. 0.001 and 0.005) PARA cannot detect the existing hot-rows in the system and delivers more than 90% miss-rate (only detects 10% of the hot-rows). However, as we increase the P value up to 0.2, PARA performs more accurately and the experimental results show that in this situation the miss-rate is reduced to around 1%. However, there is a cost associated with increasing the P value. The following figure depicts the performance overhead of PARA. According to this figure, there is more than 16% performance overhead when PARA uses a P value equal to 0.2 to mitigate malicious codes. Note that since PARA is stateless, the performance overhead imposed by this technique depends on the number of activations in the memory traces rather than the behaviour of individual kernels.

The following figure presents the ARMOR overhead for the evaluated malicious kernels. This graph is slightly different than the presented graph for PARA since, ARMOR is not stateless (unlike PARA) and its performance overhead depends on the application behaviour and the number of manifested hot-rows in the system. In this figure the X-axis shows the number of targeted rows which might or might not reach the Activation Threshold and become a hot-row. For instance if we target 10 rows and try to access them using a uniform distribution access pattern along with other rows in the system then the experimental results show that non of them can manifest itself as a hot-row (cannot reach the Activation Threshold). As a result there is no performance overhead associated with them. Similarly when we target only one row to access it more frequently, it can manifest itself as a hot row multiple times in different refresh periods and as a result there is a higher performance overhead associated with it. Therefore, ARMOR only imposes performance overhead to the system if any hot row manifests itself during execution time. The associated performance overhead imposed by ARMOR is due to refreshing both adjacent rows (Victim-Rows) to a hot-row (Row-Aggressor).

The presented malicious access pattern here might not representative of all the possible access patterns that induce Row Hammer error. However, this simple example shows how techniques like PARA are not a safe solution when talking about the security issue in DRAMs.

Is ARMOR Applicable to other Area of Research?

ARMOR proposes a novel technique to recognize frequently accessed items in a stream of data. This stream of data does not necessary need to be a stream of Memory Activation Commands and for instance ARMOR can be used to recognize the frequently issued Read/Write/Refresh Commands. Having said that, it is clear that the methodology proposed by ARMOR might also be applicable to many other memory-based or non-memory-based research area.

School of Computer Science - Advanced Processor Technologies Group (APT)