In a conventional superscalar processor, functional units would communicate through a global register bank; an instruction executed by the load/store unit would write its result to one global register from where it would be read by a subsequent instruction executing in another functional unit. In SCALP, inter-unit communication is rather different. Although a global register bank is present, it is used less frequently than in a conventional processor. Instead a system of FIFO queues directly interconnects functional units so that data may be sent between them.
This queue-based model can be thought of as another level of the memory hierarchy. In a conventional processor there are two programmer-visible levels of memory hierarchy, namely the main memory (accessed via load and store instructions) and the global registers (accessed by all instructions). In SCALP there are three levels; the main memory as before, the global registers which are now accessed by means of read and write instructions, and the short-term values in transit between functional units in the queues.
Each functional unit has a number of input queues. Instructions do not need to specify where their operands will come from; this is implied by the instruction itself, for example a two-operand add will always use operands from the "alua" and "alub" queues. The results of instructions are sent via a routing network to other operand queues. This destination queue is specified by the instruction.
SCALP has eleven queues. Nine of them are "normal" in the sense that they carry byte or word data, and two carry booleans. The "normal" queues are as follows:
"alua" and "alub" : the inputs to the ALU
"mema" and "memd" : the inputs to the Load/Store unit, providing addresses and data respectively
"regd" : the input to the Register Bank unit, providing data for write instructions
"movea" and "moveb" : the main inputs to the Move unit
"link" : an additional input to the move unit, which carries only subroutine return addresses generated by the BRL (branch and link) instruction. This queue cannot be specified as a destination by any other instruction.
"brc" : an input to the branch unit, providing addresses for subroutine returns and other computed branches.
The eight queues above excluding "link" are specified as the destination by the majority of instructions. Compare instructions generate boolean results which may be sent to the following destinations:
"cmove" : an input to the move unit used by the conditional move instruction.
"cbran" : an input to the branch unit used by conditional branch instructions.
Chunks are 12 bits long. Groups of five chunks, along with four additional bits to indicate which of the chunks contain immediate values, are stored in aligned 8 byte "igrps". Chunk addresses are specified by giving the address of the first byte of the igrp followed by the "sub address" of the chunk; for example the first igrp fetched after reset contains chunks 0x00000000.0 to 0x00000000.4.
The format of an igrp is as follows:
bits function ------------------------------- 0-11 chunk 0 12-23 chunk 1 24-35 chunk 2 36-47 chunk 3 48-59 chunk 4 60-63 ifieldThe "ifield" specifies which chunks are instructions and which are immediates. In fact, it is decoded as follows to indicate if each chunk is an instruction that has an immediate in the next field.
Chunk has immediate in following chunk ifield 0 1 2 3 4 --------------------------------------------- 0 FALSE FALSE FALSE TRUE FALSE 1 TRUE FALSE FALSE TRUE FALSE 2 FALSE FALSE TRUE FALSE FALSE 3 TRUE FALSE TRUE FALSE FALSE 4 FALSE FALSE FALSE FALSE FALSE 5 TRUE FALSE FALSE FALSE FALSE 6 FALSE FALSE FALSE FALSE TRUE 7 TRUE FALSE FALSE FALSE TRUE 8 FALSE TRUE FALSE TRUE FALSE 9 TRUE TRUE FALSE TRUE FALSE 10 FALSE FALSE TRUE FALSE TRUE 11 TRUE FALSE TRUE FALSE TRUE 12 FALSE TRUE FALSE FALSE FALSE 13 TRUE TRUE FALSE FALSE FALSE 14 FALSE TRUE FALSE FALSE TRUE 15 TRUE TRUE FALSE FALSE TRUEChunks specifying branch instructions have a special format in which bits 3, 10 and 11 are set. For all other instructions the bits of the chunk are interpreted as follows:
bits function --------------------------------------------- 0 width of operation; 0=byte, 1=word. 1-6 functional unit specific 7-9 destination queue specifier 10-11 functional unit numberThe destination queue is identified as follows for normal instructions:
0 : regd 1 : movea 2 : moveb 3 : cbran 4 : alua 5 : alub 6 : mema 7 : memdFor compare instructions, the destination is as follows:
0 : cbran 1 : cmoveThe functional unit is identified as follows:
0 : Load/Store 1 : ALU 2 : Register Bank 3 : MoveThe following sections describe the instructions on a per functional unit basis; however first an overview of the assembly language syntax is given.
instruction [ width ] [ immediate ] [ -> destination ]
The optional width specification is either ".b" to specify a byte operation or ".w" to specify a word (32-bit) operation.
Some instructions such as stores do not specify a destination.
An example of possible instructions is:
add.b 100 -> regd ; add 100 to a byte value from the alua ; queue and send the result to the regd ; queue.
These instructions read a boolean value from the cbran queue and only cause a branch if it is true or false respectively.
Perform an unconditional branch, and send the address of the instruction following the branch to the link queue.
Read an address from the BRC queue and branch to that destination.
Not really a branch instruction, but handled by the branch unit. Executing Halt causes SCALP to suspend activity until an external signal (resume) is asserted. In this case it will continue execution with the next instruction in sequence.
Branch instructions may occupy one or two chunks. If they occupy one chunk, the layout is as follows:
bits function ----------------------------- 0-2 opcode (see below) 3 must be 1 4-6 sub address of destination instruction 7-9 displacement to igrp address in 8-byte blocks 10-11 must be 1The opcodes specified by bits 0-2 are as follows:
0 BRT 1 undefined 2 BRF 3 undefined 4 BRC 5 HALT 6 BR 7 BRLThe displacement specified by bits 7 to 9 is a signed displacement relative to the current igrp. If a second chunk is used, it provides an additional 12 bits of displacement, but bit 9 of the first chunk remains the sign bit for the displacement.
The assembly language syntax is simply
Subroutine calls may be implemented as follows. To call a subroutine, use BRL
If the subroutine is a leaf routine, it may simply send the link address to the BRC queue using MVLINK
MVLINK -> BRC
and then execute a computed branch to return to the caller
If the subroutine is not a leaf routine, it must save the link address in a register or main memory, for example
MVLINK -> REGD WRITE 19 ... ... READ 19 -> BRC BRC
Add 3 Add immediate 1 Add1 17 Add4 49 Sub 11 Sub immediate 9 Sub1 25 Sub4 57 RSB immediate 13 Neg 21Note that whether an instruction is indicated as "immediate" here must correspond with whether the ifield indicates that the instruction in this chunk takes an immediate in the next chunk.
Add1, Add4, Sub1 and Sub4 provide shortcuts which save the cost of an immediate field for these very common operations.
Add and Sub read operands from queues alua and alub. The other instructions read only from queue alua. Sub subtracts the alub value from the alua value. Sub immediate subtracts the immediate value from the alua value. RSB performs the reverse operation. Neg is the same as RSB 0.
Immediate operands are sign extended.
OR 2 OR immediate 0 AND 6 AND immediate 4 XOR 10 XOR immediate 8These instructions behave similarly to the arithmetic operations, but any immediate field is not sign extended.
CMPEQ 42 ( == ) CMPEQ immediate 40 CMPLT 43 ( < ) CMPLT immediate 41 CMPLTEQ 47 ( <= ) CMPLTEQ immediate 45Note that the other comparisons ( !=, >, >= ) can be carried out by executing one of these instructions and then using the BRF instruction.
The alua value can be considered to the left of the operator and the alub or immediate value to the right.
Move simply copies the immediate value provided to the destination.
MKIMM immediate 16
MKIMM is used to construct arbitrary length immediate values. It reads a value from alua and shifts it left by 12 bits. It then ORs this with the immediate value. This may be used as follows:
move 0x087 -> alua mkimm 0x654 -> alua mkimm 0x321 -> ...This sequence sends the value 0x87654321 to its destination.
For store, data is read from the memd queue. For load, data is sent to the specified destination.
The format of the functional unit specific bits in a load/store instruction is as follows:
bits function -------------------------- 1 operation (0=load, 1=store) 2-6 displacementLoad/store instructions may occupy one or two chunks. Any second chunk provides an additional 12 bits of displacement. The displacement is signed. If an immediate chunk is used, bit 6 of the instruction chunk remains the sign bit.
For byte operations, the displacement is in bytes. For word operations, the mema value must be word aligned and the displacement is in words.
bits function -------------------------- 1 operation (0=read, 1=write) 2-6 register numberThe 5 bit register number specifies one of the 32 registers in the register bank unit.
For write, the data to be stored is read from the regd queue. For read, the data is sent to the specified destination.
DUP reads a single data value from the movea queue and sends it to two destinations. The destinations are specified separated by a comma, thus:
dup -> regd, memd
SEQ reads two data values from the movea and moveb queues and sends them to the same destination, the movea value before the moveb value. This is useful to apply a fixed ordering to operations when the order would otherwise be nondeterministic.
MVLINK reads a value from the LINK queue and sends it to the specified destination. This is used in subroutine entry codes.
CMOVE (conditional move)
CMOVE reads values from the movea and moveb queues, and sends one of the values to the destination depending on value of a boolean read from the CMOVE queue. If the boolean is true, the movea value is sent; if it is false, the moveb value is sent.
The organisation of the move unit chunk is as follows:
bits function -------------------------------- 1-2 opcode 3 must be 0 4-6 second destination for DUP instructionThe operations are encoded as
opcode instruction ---------------------------------- 0 cmove 1 mvlink 2 seq 3 dup
Consider the following example
read 12 -> alua load -> alua add1 -> regd sub1 -> memdIn this case, the results of the register read and the memory load may arrive at the alua queue in either order. To guarantee correct ordering, the following rules must be obeyed:
(1) If instruction I at functional unit A sends a value V to queue Q, and then instruction J at functional unit B sends a value W to queue Q, the order in which V and W arrive is not defined (and the code is illegal) unless instruction J could not have started execution until after value V had been consumed.
(2) If instruction I at functional unit A sends a value V to queue Q, and then instruction J also at functional unit A sends a value W to queue Q, then V will always arrive at Q before W.
The SEQ instruction is useful in examples such as the above. This code could be re-written as
read 12 -> movea load -> moveb seq -> alua add1 -> regd sub1 -> memdThis code is legal because both values sent to alua come from the same source functional unit (the move unit), and so rule (2) applies.
The scaSIM simulator can detect violations of these rules.
Note that illegal code is also able to cause the processor to deadlock by trying to read from an empty queue or writing to a full one. Once again the scaSIM simulator can detect this behaviour.
.s : SCALP assembly language program
.o : Assembled object file
.list : Annotated listing file
Comments are delimited by either ; or -- and the end of line.
scASM is fully case-insensitive.
scASM divides the input file into regions which may contain either code or data. The directives "code" and "data" indicate the start of these regions. The directives are necessary to cause alignment to occur.
Other directives are as follows:
org address Causes the following code or data to be placed at the specified address.
align number Causes the following code or data to be aligned at the specified boundary.
scASM can associate symbols with values. The most common use of this feature is to identify branch targets; for example the instruction
HERE: BR HEREimplements an infinite loop. Writing a symbol name followed by a colon associates that symbol with the program counter value of the immediately following instruction or data value.
Symbols can also be given values using the = assignment operator, for example
PHONE = 868672Expressions are used in instructions for immediate values, in symbol assignments and in data definitions. Expressions can be made up from any of the following components:
Data definitions place data values into memory. The syntax is
= [ . width ] value, value, ... or = "string"If a width is not specified, word is assumed.
This causes scasm to assemble the program in filename.s and produce an output file filename.o.
Additional options are as follows:
-help : display a help message and exit -db : provide debugging messages from the parser (may occasionally be useful to trace a syntax error) -l : create a listing file -o file : place object file in the specified file -s file : place a symbol table file in the specified file -a addr : specify start address
01234567 : ABCDEF0The first field specifies addresses and the second data. The data is always word-wide. Addresses may be non-contiguous. All data is in hexadecimal.
External symbols are resolved through the use of the -s option to scASM and the #include feature of the preprocessor. The -s option causes scASM to write a file containing definitions of all the symbols in the program. The syntax of these definitions is simply "symbol = value", so it may be read in to other files using #include.
The stages that should be used to create a .o file from several .s files are as follows:
(1) Place #include directives in each file to include .sym files for those files it is dependent on.
(2) Choose suitable start addresses for each .s file.
(3) Call scASM using -a and -s
(4) Join the resulting .o files using cat.
scaSIM may be run non-interactively by redirecting a file to its input.
If scaSIM is run interactively, it will display a banner message and display the "scasim>" prompt. Commands are read from this prompt using the GNU readline library, and so standard line editing and history mechanisms can be used. In interactive mode, scaSIM intercepts interrupts (ctrl-C) and returns to its prompt. This may be used to stop runaway simulations without quitting the program. Should it be necessary to quit the program, either type "quit" at the prompt or type ctrl-\.
EXAMINE MEMORY [ start [ end ] ]
List the memory contents between the specified start and end addresses. If no addresses are given, the listing continues from where the previous listing finished and displays approximately one page.
List the contents of the register bank.
List the contents of the queues.
Show any breakpoints currently in force.
Show the current value of the program counter
Show statistics concerning the number of instructions executed by each functional unit.
SET MEMORY BYTE|WORD addr = value
Store the given value in memory at the specified address either as a byte or a word.
SET REGISTER BYTE|WORD reg = value
Store the given value in the specified register either as a byte or a word.
SET QUEUE BYTE|WORD queue = value
Send the given value to the specified queue either as a byte or a word.
SET QUEUE queue = TRUE|FALSE
Send the given boolean value to the specified boolean queue.
SET BREAK addr.subaddr
Set a breakpoint at the specified instruction group address and chunk sub address.
SET PC = addr.subaddr
Set the program counter to the specified address and sub address.
Remove a value from the front of the specified queue
Load memory from the specified filename.
Execute one instruction.
Execute instructions until either a breakpoint is reached, a HALT instruction is executed or ctrl-C is typed.
CLEAR BREAKPOINT num
Clear the specified breakpoint.
DISAS [ start [ end ] ]
Disassemble memory between the specified addresses. If addresses are not specified, the disassembly continues from where the last operation finished and produces about a screenful of output.
Send functional unit test vectors to the specified file.
Stop creating test vectors.
Reinitialise the simulator. Queues, registers and memory are cleared.
HELP [ subject ]
Display help on the specified subject.