Cosmin Gorgovan, Amanieu D'antras, and Mikel Luján
As the ARM architecture expands beyond its traditional embedded domain, there is a growing interest in dynamic binary modification (DBM) tools for general-purpose multicore processors that are part of the ARM family. Existing DBM tools for ARM suffer from introducing large overheads in the execution of applications. The specific questions that this article addresses are (i) how to develop such DBM tools for the ARM architecture and (ii) whether new optimisations are plausible and needed. We describe the general design of MAMBO, a new DBM tool for ARM, which we release together with this publication, and introduce novel optimisations to handle indirect branches. In addition, we explore scenarios in which it may be possible to relax the transparency offered by DBM tools to allow extra optimisations to be applied. These scenarios arise from analysing the most typical usages: for example, application binaries without handcrafted assembly. The performance evaluation shows that MAMBO introduces small overheads for SPEC CPU2006 and PARSEC 3.0 when comparing with the execution times of the unmodified programs: a geometric mean overhead of 28% on a Cortex-A9 and of 34% on a Cortex-A15 for CPU2006, and between 27% and 32%, depending on the number of threads, for PARSEC on a Cortex-A15.