Table of Figures

2.1: Decoding Fixed Length Instructions

2.2: Decoding Variable Length Instructions

2.3: More Efficient Variable Length Instruction Decoding

2.4: Variable Length Instruction Decoding with a Control Field

3.1: The Relationship between Parallelism and Power Efficiency

3.2: A 6-Stage Pipeline

3.3: Timing of a 6-Stage Pipeline

3.4: Branches in a 6-Stage Pipeline

3.5: Timing of Branches in a 6-Stage Pipeline

3.6: A Pipeline with a dedicated branch adder

3.7: Timing of Branches with a dedicated branch adder

3.8: A 6-stage Pipeline with Forwarding

3.9: Control for Forwarding

3.10: Symmetric 2-way superscalar processor

3.11: Forwarding in a Superscalar Processor

3.12: Superscalar Processor with Out of Order Issue and Register Renaming

4.1: Speed of Synchronous and Asynchronous Circuits

4.2: Power Saving in Variable Demand Systems

4.3: Dynamic Supply Voltage Adjustment controlled by Fifo Occupancy

4.4: Dynamic Supply Voltage Adjustment controlled by a HALT instruction

4.5: A Simple Synchronous Pipeline

4.6: A Simple Asynchronous Pipeline with Matched Delays

4.7: A Simple Asynchronous Pipeline with Completion Detection

4.8: A Synchronous Pipeline with Forwarding

4.9: An Asynchronous Pipeline with Forwarding

4.10: Performance of Synchronous and Asynchronous Pipelines with and without Forwarding

4.11: An Asynchronous Pipeline with Conditional Forwarding

4.12: Synchronous Parallel Functional Units

4.13: Asynchronous Parallel Functional Units

4.14: Counterflow Pipeline Processor

5.1: SCALP Processor Organisation

5.2: Scalp Program Dataflow Representation

5.3: SCALP Instruction Format

5.4: Move unit instruction format

5.5: Register bank instruction formats

5.6: ALU instruction format

5.7: Load/Store instruction format

5.8: Branch instruction format

5.9: Five Instruction Group (FIG) format

5.10: The division of responsibilities between hardware and the compiler (from [HOOG94])

6.1: Four Phase Asynchronous Signalling

6.2: Two Phase Asynchronous Signalling

6.3: An Example Muller C Gate

6.4: SCALP Implementation Overview

6.5: An Unsatisfactory Issuer Architecture

6.6: Instruction Issuer Architecture

6.7: Issuer Token Path

6.8: Issuer Cell Internal Organisation

6.9: Token Block Implementation

6.10: Decode Block Implementation

6.11: Sequence Block Implementation

6.12: Instruction Issuer Architectural Performance

6.13: Single Bus Network

6.14: Fully Populated Crossbar Network

6.15: Two Bus Network

6.16: Four way mutual exclusion circuit built from two way mutual exclusion elements

6.17: Arbiter for Result Network

6.18: An Asymmetric Result Network

6.19: Function Units General Structure

6.20: Branch Unit Sequencer

6.21: SCALP Block Sizes

7.1: Pipeline Visualisation

7.2: Instruction Issuer Performance

7.3: Performance of Isolated Pipeline Stages

7.4: Performance and Branch Interval

7.5: Instruction Sequence Functional Unit Parallelism

7.6: Example Program Functional Unit Parallelism

7.7: Functional Unit Use for Example Programs

8.1: Instruction Encoding with Separate Instructions and Result Routing