Go to main content

School of Computer Science Intranet

APT research areas

Discover our main research areas

The AMULET1 microprocessor

BCS BCS

The AMULET1 microprocessor is the first large scale asynchronous circuit produced by the APT group. It is an implementation of the ARM processor architecture using the Micropipeline design style. Work was begun at the end of 1990 and the design despatched for fabrication in February 1993. The primary intent was to demonstrate that an asynchronous microprocessor can offer a reduction in electrical power consumption over a synchronous design in the same role.

The design incorporates a number of concurrent units which cooperate to give instruction level compatibility with the existing synchronous part. These include an Address unit, which autonomously generates instruction fetch requests and interleaves (non-deterministically) data requests from the Execution unit; a Register file which sources operands, queues write destinations and handles data dependencies; an Execution unit which includes a multiplier, a shifter and an ALU with data-dependent delay; a Data interface which performs byte extraction and alignment and includes an instruction prefetch buffer, and a control path which performs instruction decode. These units all operate independently, only synchronizing at mutual interfaces to exchange data.

The design demonstrates that all the usual problems of processor design can be solved in this asynchronous framework: backwards instruction set compatibility, interrupts and exact exceptions for memory faults are all covered. It also demonstrates some unusual behaviour, for instance non-deterministic prefetch depth beyond a branch instruction (though the instructions which actually get executed are, of course, deterministic). There are some unusual problems for compiler optimization, as the metric which must be used to compare alternative code sequences is continuous rather than discrete, and the non-determinism in external behaviour must also be taken into account.

The chip was designed using a mixture of custom datapath and compiled control logic elements, as was the synchronous ARM. The fabrication technology is the same as that used for one version of the synchronous part, reducing the number of variables when comparing the two parts.

Two silicon implementations have been received and preliminary measurements have been taken from these. The first is a 0.7um process and has achieved about 28kDhrystones running the standard benchmark programme. The other is a 1um implementation and achieves about 20kDhrystones. For the faster of the parts this is equivalent to a synchronous ARM6 clocked at around 20MHz; in the case of AMULET1 it is likely that this speed is limited by the memory system cycle time (just over 50ns) rather than the processor chip itself.

A fair comparison of devices at the same geometries gives the AMULET1 performance as about 70% of that of an ARM6 running at 20MHz. Its power consumption is very similar to that of the ARM6; the AMULET1 therefore delivers about 80 MIPS/W (compared with around 120 from a 20MHz ARM6). Multiplication is several times faster on the AMULET1 owing to the inclusion of a specialised asynchronous multiplier. This performance is reasonable considering that the AMULET1 is a first generation part, whereas the synchronous ARM has undergone several design iterations. The performance of AMULET2 is somewhat greater, whilst using less power.

The macrocell size (without pad ring) is 5.5mm by 4.5mm on a 1 micron CMOS process, which is about twice the area of the synchronous part. Some of the increase can be attributed to the more sophisticated organization of the new part: it has a deeper pipeline than the clocked version, and it supports multiple outstanding memory requests; there is also specialised circuitry to increase the multiplication speed. Although there is undoubtedly some overhead attributable to the asynchronous control logic, we estimate this to be closer to 20% than to the 100% suggested by the direct comparison.

AMULET1 is code compatible with ARM6 and is so is capable of running existing binaries without modification. The implementation also includes features such as interrupts and memory aborts so that it is a fully capable component for use as a computer MPU.

The work has taken place as part of a broad ESPRIT funded investigation into low-power technologies within the European Open Microprocessor systems Initiative (OMI) programme, where there is interest in low-power techniques both for portable equipment and (in the longer term) to alleviate the problems of the increasingly high dissipation of high-performance chips. This initial investigation into the role asynchronous logic might play in the quest for lower power has now demonstrated through simulation (and shortly through silicon) that asynchronous techniques can be applied to problems of the scale of a complete microprocessor.

Click here for a photo of the AMULET1 design team.

Other work springing from AMULET1 included an Occam model - "OCCARM" - which now has its own web page.

AMULET1 bibliography

The AMULET1 development was supported by the OMI/MAP project. The designers wish to acknowledge this support from the CEC.

Thanks for support of various natures is also due to:

  • Advanced RISC Machines Ltd.
  • Acorn Computers Ltd.
  • Compass Design Automation Ltd.
  • GEC Plessey Semiconductors Ltd.
  • VLSI Technology Ltd.