A Vector Architecture for Multimedia Java Applications
Ahmed H.M.R. El-Mahdy
Abstract
Multimedia applications written in Java possess different processing requirements from conventional applications for which current microprocessors have been optimised. Multimedia applications are computationally intensive, stretching the capabilities of current microprocessors. Initial multimedia extensions relied on short vector architecture extensions to the instruction sets. While these exploit the wide datapaths in current microprocessors, they introduce misalignment overheads and restrict the addressing mode to sequential access. Java, being a platform-independent language, relies on a stack-based intermediate language, bytecode, needing a translation layer to be executed. The translation has to be fast and this limits the optimisations that can be done in software and opens the question of whether supporting Java bytecode natively is a better idea for high performance processing.
The thesis first demonstrates that a register-based instruction set is more efficient for executing Java bytecode than a stack-based instruction set. It then develops a novel vector architecture that exploits the two dimensional data access patterns of multimedia applications. This results in an instruction set combining the benefits of subword and traditional vector processing. The thesis also develops a novel cache prefetching mechanism to capture two dimensional locality of reference. A prototype translation system is also developed to support the vector instruction set from high-level Java programs.
Throughout the design, high-level analysis tools, such as trace-driven simulation, software instrumentation of Sun� JDK, and simple analytical performance models, are developed to aid the design decisions and verify existing results in the literature. The analytical models are used further to derive an optimal configuration. A detailed simulation of the system is developed and used to analyse the performance of MPEG-2 video encode and decode applications, and two kernels taken from 3D and voice recognition applications. The simulation integrates the vector architecture into a singlechip multithreaded multiprocessor architecture and studies the interactions. The architecture has shown an advantage in terms of reducing the instruction count, cache misses, and overall execution time for current and future technologies.