SCALP: A Superscalar Asynchronous Low-Power Processor

A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science and Engineering

Philip Brian Endecott

Department of Computer Science

1995

Abstract

The design of low power microprocessors is an important research area because of the increasing demand for high-performance portable computers with long battery life. Most previous low power microprocessor designs have been constrained by the need to maintain compatibility with existing instruction sets and so have been able to apply power efficiency techniques only at the implementation level and below. Furthermore conventional design approaches have prevented processor implementations from exploiting the power saving potential of asynchronous logic. This thesis describes a processor, SCALP, whose architectural design and implementation are free from these constraints.

SCALP is motivated by three objectives: high code density, highly parallel operation, and asynchronous implementation. These three objectives should all lead to increased power efficiency and also dictate the nature of the architecture.

Conventional instruction sets indicate the flow of data between instructions by means of register numbers; each instruction gives register numbers for its operands and result. This technique leads to complexity in pipelined superscalar implementations: register numbers from many instructions must be compared to identify dependencies and activate appropriate forwarding paths. In asynchronous systems these forwarding paths cause further inefficiency due to the additional synchronisation that they impose. Furthermore the register specifier bits in a typical instruction set take up around half of the total instruction, making them crucial to code density.

SCALP's main architectural innovation is its use of "explicit forwarding". SCALP does not use a global register bank but rather indicates for each instruction to where the result should be sent. This takes the form of a destination queue identifier. One such queue is associated with each operand required by each functional unit. Using this scheme the need for many register specifier comparators is eliminated. The arrangement also suits asynchronous implementation and increases code density.

SCALP has other features aimed at increased code density or other ways of increasing power efficiency: it has variable length instructions and operations need only activate part of the datapath for byte operations.

After describing the background to and features of the proposed architecture the implementation is described. SCALP is the first known asynchronous implementation of a superscalar architecture. It operates using the four-phase bundled data protocol like the AMULET2 processor. Much of the design uses a "macromodule" design approach, and the total size of the design is around 9,500 components. The design of some of the more interesting parts of the implementation, such as the parallel asynchronous instruction issuer, are described in detail.

The implementation has been taken to gate level using the hardware description language VHDL. The implementation is extensively evaluated in comparison with a conventional instruction set and conclusions about its effectiveness are drawn.

Declaration

No portion of the work referred to in this thesis has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning, except as indicated in section 2.1.

Copyright and Intellectual Property Rights

(1) Copyright in the text of this thesis rests with the Author. Copies (by any process) either in full, or of extracts, may be made only in accordance with instructions given by the Author and lodged in the John Rylands University Library of Manchester. Details may be obtained from the Librarian. This page must form part of any such copies made. Further copies (by any process) of copies made in accordance with such instructions may not be made without the permission (in writing) of the Author.

(2) The ownership of any intellectual property rights which may be described in this thesis is vested in the University of Manchester, subject to any prior agreement to the contrary, and may not be made available for use by third parties without the written permission of the University, which will prescribe the terms and conditions of any such agreement.

Further information on the conditions under which disclosures and exploitation may take place is available from the Head of Department of the Department of Computer Science.

Acknowledgements

During my three years as a postgraduate student in Manchester I have received a great deal of support and encouragement from many students and staff in the Department of Computer Science. I would like to thank everyone for their help.

I am especially grateful to my supervisor Prof. Steve Furber who has encouraged my creativity, and to the other members of the AMULET research group.

Thanks are also due to those people who have helped by proof reading and commenting on this thesis, in particular Richard York and David Gilbert.

The Author

Philip Endecott obtained a B.Sc. (I) degree in Computer Science from the University of Manchester in 1991. After a year employed by Advanced RISC Machines Ltd in Cambridge he returned to Manchester as a postgraduate student. In 1993 his M.Sc. thesis entitled "Processor Architectures for Power Efficiency and Asynchronous Implementation" was submitted. This thesis represents the result of a further two years work in this area.


Next Chapter