The Jamaica Project - Architecture
The Jamaica architecture has been prototyped using a cycle accurate chip multi-processor (CMP) simulation. Cycle accurate simulations enable the most accurate measurement of performance without incurring the time and cost overheads associated with silicon level design and hardware production. CMP design simplifies overall processor design costs by reusing the design of the processor core. However, exploitation of the multi-processing and multi-threading capabilities of CMPs provides many design challenges for the hardware designer as well as for operating system and compiler designers.
The Jamaica CMP design employs a number of processor cores with their own local caches sharing a memory bus and a special purpose work distribution ring. Each processor can rapidly dispatch work to another via the distribution ring. The distribution ring is also used to distribute tokens which advertise the presence of idle processors. As well as incorporating multiple cores the architecture also supports chip multi-threading (CMT), maintaining multiple execution contexts on each core to implement virtual processors each running their own thread. This enables a processor core to immediately context switch to another runnable thread when a lengthy activity (such as a cache miss) stalls execution, thereby hiding memory and other latencies. Synchronization of threads is implemented via memory using cache read/write protocols which support locked load and store operations.
As well as considering threading performance, multimedia workloads are also studied and vector additions to the architecture have been tested.
CMP simulation needs to be accurate and also fast. Accuracy is needed so that compiler and operating system designs can be developed which maximise processor utilisation. Speed is necessary as the simulator will execute billions of instructions in modern benchmarks. An important but slightly lesser concern is the overhead involved in booting and running a dynamic compilation environment such as the Java virtual machine. We achieve accuracy by making our simulations cycle accurate. Currently simulation speeds are kept fast by an efficient simulator written in C. To get further speed up and to take advantage of inherently parallel work-loads a new distributed and multi-threaded simulator is being created, with an easily configurable, plug-in, object-oriented design.