SCALP Processor Programmers' Model and Software Tools User Guide

SCALP is the "Superscalar Asynchronous Low Power Processor". This guide provides a programmers' model of the behaviour of the processor and describes the supporting software tools, scASM (assembler) and scaSIM (simulator).

1. Programmers' Model

The SCALP processor has a number of unusual features which make assembly language programming somewhat different from more conventional processors. These features are motivated by desire to increase power efficiency and to ease asynchronous implementation.

1.1 Functional Units and Queues

In SCALP, each instruction is executed by one of five functional units. The functional units are:

  • The ALU, which is responsible for arithmetic and logical operations including compares.

  • The Load/Store unit, which is responsible for accesses to external memory.

  • The Register Bank unit, which provides medium-term storage for data items.

  • The Move unit, which implements a number of miscellaneous data-routing operations.

  • The Branch unit, which provides for control transfer operations.

    In a conventional superscalar processor, functional units would communicate through a global register bank; an instruction executed by the load/store unit would write its result to one global register from where it would be read by a subsequent instruction executing in another functional unit. In SCALP, inter-unit communication is rather different. Although a global register bank is present, it is used less frequently than in a conventional processor. Instead a system of FIFO queues directly interconnects functional units so that data may be sent between them.

    This queue-based model can be thought of as another level of the memory hierarchy. In a conventional processor there are two programmer-visible levels of memory hierarchy, namely the main memory (accessed via load and store instructions) and the global registers (accessed by all instructions). In SCALP there are three levels; the main memory as before, the global registers which are now accessed by means of read and write instructions, and the short-term values in transit between functional units in the queues.

    Each functional unit has a number of input queues. Instructions do not need to specify where their operands will come from; this is implied by the instruction itself, for example a two-operand add will always use operands from the "alua" and "alub" queues. The results of instructions are sent via a routing network to other operand queues. This destination queue is specified by the instruction.

    SCALP has eleven queues. Nine of them are "normal" in the sense that they carry byte or word data, and two carry booleans. The "normal" queues are as follows:

    "alua" and "alub" : the inputs to the ALU

    "mema" and "memd" : the inputs to the Load/Store unit, providing addresses and data respectively

    "regd" : the input to the Register Bank unit, providing data for write instructions

    "movea" and "moveb" : the main inputs to the Move unit

    "link" : an additional input to the move unit, which carries only subroutine return addresses generated by the BRL (branch and link) instruction. This queue cannot be specified as a destination by any other instruction.

    "brc" : an input to the branch unit, providing addresses for subroutine returns and other computed branches.

    The eight queues above excluding "link" are specified as the destination by the majority of instructions. Compare instructions generate boolean results which may be sent to the following destinations:

    "cmove" : an input to the move unit used by the conditional move instruction.

    "cbran" : an input to the branch unit used by conditional branch instructions.

    1.2 Instruction groups and Instructions

    SCALP instructions may be made up of either one or two "chunks". In the case of two-chunk instructions, the second chunk is an immediate operand.

    Chunks are 12 bits long. Groups of five chunks, along with four additional bits to indicate which of the chunks contain immediate values, are stored in aligned 8 byte "igrps". Chunk addresses are specified by giving the address of the first byte of the igrp followed by the "sub address" of the chunk; for example the first igrp fetched after reset contains chunks 0x00000000.0 to 0x00000000.4.

    The format of an igrp is as follows:

    bits      function
    0-11      chunk 0
    12-23     chunk 1
    24-35     chunk 2
    36-47     chunk 3
    48-59     chunk 4
    60-63     ifield
    The "ifield" specifies which chunks are instructions and which are immediates. In fact, it is decoded as follows to indicate if each chunk is an instruction that has an immediate in the next field.

              Chunk has immediate in following chunk 
    ifield        0      1      2      3      4
     0          FALSE  FALSE  FALSE  TRUE   FALSE   
     1          TRUE   FALSE  FALSE  TRUE   FALSE  
     2          FALSE  FALSE  TRUE   FALSE  FALSE  
     3          TRUE   FALSE  TRUE   FALSE  FALSE  
     4          FALSE  FALSE  FALSE  FALSE  FALSE  
     5          TRUE   FALSE  FALSE  FALSE  FALSE  
     6          FALSE  FALSE  FALSE  FALSE  TRUE   
     7          TRUE   FALSE  FALSE  FALSE  TRUE   
     8          FALSE  TRUE   FALSE  TRUE   FALSE  
     9          TRUE   TRUE   FALSE  TRUE   FALSE  
    10          FALSE  FALSE  TRUE   FALSE  TRUE
    11          TRUE   FALSE  TRUE   FALSE  TRUE   
    12          FALSE  TRUE   FALSE  FALSE  FALSE  
    13          TRUE   TRUE   FALSE  FALSE  FALSE  
    14          FALSE  TRUE   FALSE  FALSE  TRUE   
    15          TRUE   TRUE   FALSE  FALSE  TRUE  
    Chunks specifying branch instructions have a special format in which bits 3, 10 and 11 are set. For all other instructions the bits of the chunk are interpreted as follows:

    bits      function
    0         width of operation; 0=byte, 1=word.
    1-6       functional unit specific
    7-9       destination queue specifier
    10-11     functional unit number
    The destination queue is identified as follows for normal instructions:

    0 : regd
    1 : movea
    2 : moveb
    3 : cbran
    4 : alua
    5 : alub
    6 : mema
    7 : memd
    For compare instructions, the destination is as follows:

    0 : cbran
    1 : cmove
    The functional unit is identified as follows:

    0 : Load/Store
    1 : ALU
    2 : Register Bank
    3 : Move
    The following sections describe the instructions on a per functional unit basis; however first an overview of the assembly language syntax is given.

    1.3 Assembly Language

    The general syntax for an instruction is as follows:

    instruction [ width ] [ immediate ] [ -> destination ]

    The optional width specification is either ".b" to specify a byte operation or ".w" to specify a word (32-bit) operation.

    Some instructions such as stores do not specify a destination.

    An example of possible instructions is:

         add.b 100 -> regd      ;  add 100 to a byte value from the alua
                                ;  queue and send the result to the regd
                                ;  queue.

    1.4 Branch instructions

    SCALP has 6 branch instructions:

  • Branch-if-true (BRT) and Branch-if-false (BRF)

    These instructions read a boolean value from the cbran queue and only cause a branch if it is true or false respectively.

  • Branch always (BR)

    Unconditional branch

  • Branch and link (BRL)

    Perform an unconditional branch, and send the address of the instruction following the branch to the link queue.

  • Computed branch (BRC)

    Read an address from the BRC queue and branch to that destination.

  • Halt

    Not really a branch instruction, but handled by the branch unit. Executing Halt causes SCALP to suspend activity until an external signal (resume) is asserted. In this case it will continue execution with the next instruction in sequence.

    Branch instructions may occupy one or two chunks. If they occupy one chunk, the layout is as follows:

    bits      function
    0-2       opcode (see below)
    3         must be 1
    4-6       sub address of destination instruction
    7-9       displacement to igrp address in 8-byte blocks         
    10-11     must be 1
    The opcodes specified by bits 0-2 are as follows:

    0         BRT
    1         undefined
    2         BRF
    3         undefined
    4         BRC
    5         HALT
    6         BR
    7         BRL
    The displacement specified by bits 7 to 9 is a signed displacement relative to the current igrp. If a second chunk is used, it provides an additional 12 bits of displacement, but bit 9 of the first chunk remains the sign bit for the displacement.

    The assembly language syntax is simply

    BRT|BRF|BR|BRL destination

    or BRC|HALT

    Subroutine calls may be implemented as follows. To call a subroutine, use BRL

    BRL sub

    If the subroutine is a leaf routine, it may simply send the link address to the BRC queue using MVLINK


    and then execute a computed branch to return to the caller


    If the subroutine is not a leaf routine, it must save the link address in a register or main memory, for example

          MVLINK -> REGD
          WRITE 19
          READ 19 -> BRC

    1.5 ALU instructions

    For ALU instructions, all six functional unit specific bits indicate what operation is to be carried out. The possible instructions and the corresponding opcodes are as follows:

    Arithmetic Operations

    Add                        3
    Add immediate              1
    Add1                      17
    Add4                      49
    Sub                       11
    Sub immediate              9
    Sub1                      25
    Sub4                      57
    RSB immediate             13
    Neg                       21
    Note that whether an instruction is indicated as "immediate" here must correspond with whether the ifield indicates that the instruction in this chunk takes an immediate in the next chunk.

    Add1, Add4, Sub1 and Sub4 provide shortcuts which save the cost of an immediate field for these very common operations.

    Add and Sub read operands from queues alua and alub. The other instructions read only from queue alua. Sub subtracts the alub value from the alua value. Sub immediate subtracts the immediate value from the alua value. RSB performs the reverse operation. Neg is the same as RSB 0.

    Immediate operands are sign extended.

    Logical Operations

    OR                          2
    OR immediate                0
    AND                         6
    AND immediate               4
    XOR                        10
    XOR immediate               8
    These instructions behave similarly to the arithmetic operations, but any immediate field is not sign extended.

    Compare instructions

    CMPEQ                       42   ( == )
    CMPEQ immediate             40
    CMPLT                       43   ( < )
    CMPLT immediate             41
    CMPLTEQ                     47   ( <= )
    CMPLTEQ immediate           45
    Note that the other comparisons ( !=, >, >= ) can be carried out by executing one of these instructions and then using the BRF instruction.

    The alua value can be considered to the left of the operator and the alub or immediate value to the right.

    Other Operations

    Move immediate 48

    Move simply copies the immediate value provided to the destination.

    MKIMM immediate 16

    MKIMM is used to construct arbitrary length immediate values. It reads a value from alua and shifts it left by 12 bits. It then ORs this with the immediate value. This may be used as follows:

        move 0x087 -> alua
        mkimm 0x654 -> alua
        mkimm 0x321 -> ...
    This sequence sends the value 0x87654321 to its destination.

    1.6 Load/Store instructions

    The load/store unit executes two instructions, load and store. The address for the operation is read from the mema queue, to which an optional displacement is added.

    For store, data is read from the memd queue. For load, data is sent to the specified destination.

    The format of the functional unit specific bits in a load/store instruction is as follows:

    bits     function
    1        operation (0=load, 1=store)
    2-6      displacement
    Load/store instructions may occupy one or two chunks. Any second chunk provides an additional 12 bits of displacement. The displacement is signed. If an immediate chunk is used, bit 6 of the instruction chunk remains the sign bit.

    For byte operations, the displacement is in bytes. For word operations, the mema value must be word aligned and the displacement is in words.

    1.7 Register Bank Instructions

    The register bank unit provides two instructions, read and write. These are always one chunk instructions and have the following format:

    bits     function
    1        operation (0=read, 1=write)
    2-6      register number
    The 5 bit register number specifies one of the 32 registers in the register bank unit.

    For write, the data to be stored is read from the regd queue. For read, the data is sent to the specified destination.

    1.8 Move Unit Instructions

    The move unit provides the following four operations:

    DUP (duplicate)

    DUP reads a single data value from the movea queue and sends it to two destinations. The destinations are specified separated by a comma, thus:

    dup -> regd, memd

    SEQ (sequence)

    SEQ reads two data values from the movea and moveb queues and sends them to the same destination, the movea value before the moveb value. This is useful to apply a fixed ordering to operations when the order would otherwise be nondeterministic.


    MVLINK reads a value from the LINK queue and sends it to the specified destination. This is used in subroutine entry codes.

    CMOVE (conditional move)

    CMOVE reads values from the movea and moveb queues, and sends one of the values to the destination depending on value of a boolean read from the CMOVE queue. If the boolean is true, the movea value is sent; if it is false, the moveb value is sent.

    The organisation of the move unit chunk is as follows:

    bits      function
    1-2       opcode
    3         must be 0
    4-6       second destination for DUP instruction
    The operations are encoded as

    opcode    instruction
    0         cmove
    1         mvlink
    2         seq
    3         dup

    1.9 Queue Ordering Constraints

    SCALP fully exposes the potential non-determinism of asynchronous systems to the programmer. The time that instructions take to complete is variable, and in particular when two functional units compete to send a value to the same destination it is not defined which will win. In fact code that allows for such a competition is not allowed as it may lead to metastability.

    Consider the following example

          read 12 -> alua
          load    -> alua
          add1    -> regd
          sub1    -> memd
    In this case, the results of the register read and the memory load may arrive at the alua queue in either order. To guarantee correct ordering, the following rules must be obeyed:

    (1) If instruction I at functional unit A sends a value V to queue Q, and then instruction J at functional unit B sends a value W to queue Q, the order in which V and W arrive is not defined (and the code is illegal) unless instruction J could not have started execution until after value V had been consumed.

    (2) If instruction I at functional unit A sends a value V to queue Q, and then instruction J also at functional unit A sends a value W to queue Q, then V will always arrive at Q before W.

    The SEQ instruction is useful in examples such as the above. This code could be re-written as

          read 12 -> movea
          load    -> moveb
          seq     -> alua
          add1    -> regd
          sub1    -> memd
    This code is legal because both values sent to alua come from the same source functional unit (the move unit), and so rule (2) applies.

    The scaSIM simulator can detect violations of these rules.

    Note that illegal code is also able to cause the processor to deadlock by trying to read from an empty queue or writing to a full one. Once again the scaSIM simulator can detect this behaviour.

    1.10 Queue lengths

    All queues have a length of 2.

    2 scASM assembler

    scASM is an assembler which translates SCALP assembly language into object files that may be loaded into the scaSIM simulator, other models of SCALP or the real processor.

    2.1 File naming conventions

    By convention, the following file name extensions are used by scASM:

    .s : SCALP assembly language program

    .o : Assembled object file

    .list : Annotated listing file

    2.2 Input format

    scASM input files are first preprocessed using cpp, the `C' preprocessor. This allows the use of #include, #define, #if etc.

    Comments are delimited by either ; or -- and the end of line.

    scASM is fully case-insensitive.

    scASM divides the input file into regions which may contain either code or data. The directives "code" and "data" indicate the start of these regions. The directives are necessary to cause alignment to occur.

    Other directives are as follows:

    org address Causes the following code or data to be placed at the specified address.

    align number Causes the following code or data to be aligned at the specified boundary.

    scASM can associate symbols with values. The most common use of this feature is to identify branch targets; for example the instruction

    HERE:    BR HERE
    implements an infinite loop. Writing a symbol name followed by a colon associates that symbol with the program counter value of the immediately following instruction or data value.

    Symbols can also be given values using the = assignment operator, for example

    PHONE = 868672
    Expressions are used in instructions for immediate values, in symbol assignments and in data definitions. Expressions can be made up from any of the following components:

  • symbols

  • numbers, specified in decimal or hexadecimal (0x...)

  • characters enclosed in '' whose ASCII value is used

  • subexpressions using +, -, *, / and ^. Note that operator precedence is not respected and () must be used.

  • the special symbol . which indicates the current program counter value

    Data definitions place data values into memory. The syntax is

       = [ . width ] value, value, ...
    or = "string"
    If a width is not specified, word is assumed.

    2.3 Invocation

    Typically scASM is executed as follows:

    scasm filename.s

    This causes scasm to assemble the program in filename.s and produce an output file filename.o.

    Additional options are as follows:

        -help   :   display a help message and exit
        -db     :   provide debugging messages from the parser (may
                    occasionally be useful to trace a syntax error)
        -l      :   create a listing file
        -o file :   place object file in the specified file
        -s file :   place a symbol table file in the specified file
        -a addr :   specify start address

    2.4 Output format

    The output file contains lines as follows:

    01234567 : ABCDEF0
    The first field specifies addresses and the second data. The data is always word-wide. Addresses may be non-contiguous. All data is in hexadecimal.

    2.5 Assembling multiple files

    scASM provides limited support for multi-file projects. Although scASM will resolve symbols exported from one file and imported to another, it is the programmer's responsibility to ensure that the separate files occupy different areas of memory.

    External symbols are resolved through the use of the -s option to scASM and the #include feature of the preprocessor. The -s option causes scASM to write a file containing definitions of all the symbols in the program. The syntax of these definitions is simply "symbol = value", so it may be read in to other files using #include.

    The stages that should be used to create a .o file from several .s files are as follows:

    (1) Place #include directives in each file to include .sym files for those files it is dependent on.

    (2) Choose suitable start addresses for each .s file.

    (3) Call scASM using -a and -s

    (4) Join the resulting .o files using cat.

    3 scaSIM simulator

    3.1 Invocation

    scaSIM is normally called without arguments as scasim. Any files named on the command line are assumed to be object files produced by scASM and are loaded into simulated memory. The -help flag may be used to produce an informational message.

    scaSIM may be run non-interactively by redirecting a file to its input.

    If scaSIM is run interactively, it will display a banner message and display the "scasim>" prompt. Commands are read from this prompt using the GNU readline library, and so standard line editing and history mechanisms can be used. In interactive mode, scaSIM intercepts interrupts (ctrl-C) and returns to its prompt. This may be used to stop runaway simulations without quitting the program. Should it be necessary to quit the program, either type "quit" at the prompt or type ctrl-\.

    3.2 scaSIM Commands

    scaSIM provides the following commands:

    EXAMINE MEMORY [ start [ end ] ]

    List the memory contents between the specified start and end addresses. If no addresses are given, the listing continues from where the previous listing finished and displays approximately one page.


    List the contents of the register bank.


    List the contents of the queues.


    Show any breakpoints currently in force.


    Show the current value of the program counter


    Show statistics concerning the number of instructions executed by each functional unit.

    SET MEMORY BYTE|WORD addr = value

    Store the given value in memory at the specified address either as a byte or a word.

    SET REGISTER BYTE|WORD reg = value

    Store the given value in the specified register either as a byte or a word.

    SET QUEUE BYTE|WORD queue = value

    Send the given value to the specified queue either as a byte or a word.


    Send the given boolean value to the specified boolean queue.

    SET BREAK addr.subaddr

    Set a breakpoint at the specified instruction group address and chunk sub address.

    SET PC = addr.subaddr

    Set the program counter to the specified address and sub address.

    REMOVE queue

    Remove a value from the front of the specified queue

    LOAD "file"

    Load memory from the specified filename.


    Execute one instruction.


    Execute instructions until either a breakpoint is reached, a HALT instruction is executed or ctrl-C is typed.


    Clear the specified breakpoint.

    DISAS [ start [ end ] ]

    Disassemble memory between the specified addresses. If addresses are not specified, the disassembly continues from where the last operation finished and produces about a screenful of output.

    VECTORS "file"

    Send functional unit test vectors to the specified file.


    Stop creating test vectors.


    Reinitialise the simulator. Queues, registers and memory are cleared.

    HELP [ subject ]

    Display help on the specified subject.


    Leave scaSIM.

    3.3 Queue order dependency checking

    As mentioned previously scaSIM is able to detect violations of the queue ordering constraints. When a warning message is displayed, it gives the PC address of the offending instruction. This instruction is the one that wrote the second value into a queue, and scaSIM decided that it was able to arrive before the previous one.