STAMINA
Overview
The Split Transfer Asynchronous Macrocell Interconnection Network Architecture (STAMINA) is a system bus architecture for use in the emerging domain of asynchronous VLSI systems. It solves the problems of asynchronous macrocell interconnect in a low-cost, power and emmission efficient manner. Features of the architecture include
In addition to the support of the features found on typical synchronous buses such as AMBA and PI-Bus including
STAMINA is suited to any embedded system, a typical system is shown below.
The AMULET3i from the AMULET group at the University of Manchester uses a one-outstanding-address STAMINA bus (i.e. an implementation of a subset of the full STAMINA capabilities), referred to as MARBLE, The Manchester Asynchronous Bus for Low Energy.
STAMINA Fundamentals
Definitions and Terminology
Channel - An asynchronous communications pathway on the bus, corresponding to one request line and one acknowledge line, with any number of bundled data (payload) lines. Also known as a bundle. All channels in STAMINA are 4-phase channels.
Cycle - The basic unit of operation of the bus corresponding to one communication on a channel.
Idle - A 4-phase request-acknowledge pair is idle when the signal levels of the request and acknowledge are both low, (request=acknowledge=0)
Transfer - A read or write operation involving address and data cycles resulting in one data packet (e.g.byte/half-word/word) and one response packet being communicated between two units on the bus.
Initiator (I) - A bus initiator is a system module that is able to initiate a read or write transfer by driving the address and control lines. Only one bus initiator is allowed to initiate a transfer at any one time.
Target (T) - A bus target is a system module that responds to read and write operations within a given address space. Only one target may be active upon a channel at any one time.
Address Arbiter - This determines which initiator will be allowed to initiate the next transfer, ensuring that only one initiator is active upon the address channel at any time. The arbitration scheme is not rigidly defined, thus allowing the system designer to incorporate whatever priority/fairness is necessary.
Data Arbiter - This determines which target will be allowed to perform the next data cycle, ensuring that only one target is active upon the data channel at any time. The arbitration scheme is not rigidly defined, thus allowing the system designer to incorporate whatever priority/fairness is necessary.
Address Decoder - Decodes the transfer address on the address channel and activates one target select line indicating which bus-target is to respond to the address cycle.
Data Decoder - Decodes tags on the data channel and activates one initiator select line indicating which bus-initiator is to respond to the data cycle.
Bus control - stands for the control components required by STAMINA that are not part of a modules interface (i.e. Arbiters, Decoders and abort hardware).
Sender - The unit that will provide the data (drive the bus) during a data cycle
Receiver - The unit that will accept the data during a data cycle
Signal List
Signal | Name | Description |
---|
SNRES | Active Low Reset | Global bus reset signal |
SAarbreqn | Address Arbitration Request n | Signal from initiator to the address arbiter indicating that initiator n requires access to the address channel |
SAarbgntn | Address Arbitration Grant n | Signal from address arbiter to the initiator inidicating that it has been granted the next address cycle |
SAR | Address Request | The address bundle request line driven bu the initiator |
SAA | Address Acknowledge | The address bundle acknowledge line driven bu the target |
SAT | Address Tag | The initiator id, unique to the initiator now driving the address channel |
SAC | Address Colour | The address colour used to control data-packet ordering |
SAO | Address Operation | The operation / transfer direction required |
SA[n-1:0] | Address | The address, driven by the initiator |
SSIZE[n-1:0] | Transfer Size | The size of the data packet to be transferred, which may be byte, half-word or word |
SPROT[n-1:0] | Privilege Protection Code | Privilege information is carried by the protection code, and may be used by a bus protection unit |
SSEQ | Sequential | Sequentiality indicator |
SPRED | Predictor | Predictive sequentiality / locality indicator |
SDEF | Defer | Deferred transfer start indicator. Pulled with the address acknowledge |
SASn | Address Select n | Signal from the address decoder to target n indicating that it is to respond to the current address ccycle |
SDarbreqn | Data Arbitration Request n | Signal from the target to the data arbiter indicating that target n requires access to the data channel |
SDarbgntn | Data Arbitration Grant n | Signal from the data arbiter to the target indicating that it has been granted the next data cycle |
SDR | Data Request | The data bundle request line driven by the target |
SDA | Data Acknowledge | The data bundle acknowledge line driven by the initiator |
SDT | Data Tag | The initiator id of the initiator that should respond to the data cycle in progress |
SDC | Data Colour | The data colour used to control data-packet ordering |
SDO | Data Operation | The current operation / data transfer direction |
SD[n-1:0] | Data Bus | The bi-directional data bus, driven by the sender |
SDSn | Data Select n | Signal from the data decoder to initiator n indicating that it is to responde to the current data cycle |
SRarbreqn | Response Arbitration Request n | Signal from the target to the response arbiter indicating that target n requires access to the response channel |
SRarbgntn | Response Arbitration Grant n | Signal from the response arbiter to the target indicating that it has been granted the next response cycle |
SRR | Response Request | The response bundle request line driven by the target |
SRA | Response Acknowledge | The response bundle acknowledge line driven by the initiator |
SRT | Response Tag | The initiator id of the initiator that should respond to the response cycle in progress |
SRC | Response Colour | The response colour used to control response-packet ordering |
SR[n-1:0] | Response | The abort response from the target bundled during the push phase of the response cycle |
SRSn | Response Select n | Signal from the response decoder to initiator n indicating that it is to respond to the current response cycle |
SERR | Severe Error | Severe error/deadlock UNKNOWN - IS IT NECESSARY |
STAMINA Protocol
Push-Pull Channel based Transport
The channels within STAMINA are single-rail bundles using 4-phase signalling. The data validity protocol is shown below:
The phases of a cycle are:
The same bundled lines cannot be driven during both the PUSH and PULL phases. In order to send information during both phases, different wires must be used during the PUSH and PULL phases. Bundled signals must be stable during the phase in which they are bundled, hence lines used in the PUSH phase must be set-up before the rising edge of request and lines used in the PULL phase must be set-up before the rising edge of acknowledge.
Channel | PUSH | PULL |
---|
Address | SAT,SAC,SARNW,SA,SSIZE,SPROT,SPRED,SSEQ | SDEF |
Data | SDT,SDC,SDRNW,SD (read-data) | SD (write-data) |
Response | SRT,SRC,SR | |
Split Transfers
STAMINA uses a technique known as split-transfers as its underlying primitive operation. In a typical bus that supports split-transfers, a slow target device has the option of requesting a split-transfer. This means that the target device will complete the transfer at a later stage, either by arbitrating for the bus itself, or by the initiator retrying the transfer later (i.e. hardware-polling). Either approach requires some form of interlock and/or queueing system to ensure that the transfer can complete when ready, and these can prove expensive. In addition, the hardware-polling leads to poor power efficiency and high emissions.
STAMINA takes a new approach in that it always uses a split-transfer, and each channel on the bus is separately arbitrated. However, to avoid requiring queues at all of its interfaces STAMINA uses a new scheme:
This means that no queue will be required at the target since it will always receive data in-order since it controls the data transfers (and likewise for the response). To allow the target to accept multiple outstanding transfers, all that is required is a fifo on the addresses coming off the address channel at the target.
Initiators still require queues, but as expected the size of queue at an initiator scales with the number of outstanding transfers that initiator must support.
Thus, STAMINA is scalable on a per interface basis to support the features required of each interface. An example scenario could be:
In order to route the data and response cycles back to the correct initiator, a tag, unique to the originating initiator is passed with all cycles on all channels, this is the SAT,SDT and SRT. In addition a colour is also passed with all cycles on all channels to allow the correct data and response packet order to be maintained by the initiator interface when presenting these to the initiator device. The size of the tag is log2(number of initiators) and the size of the colour is log2(maximum number of outstanding addresses in the system).
Arbitration
Each channel within STAMINA has its own arbitration system to ensure that the channel has only one sender at any time. Hidden arbitration is used, where the next cycle can be arbitrated for concurrently with the current cycle in progress. To facilitate this overlapping, the grant is interpretted as giving permission for the next cycle, and once the cycle has started, the arbitration request should be removed to allow the next arbitration to take place.
Decode
Receivers on the channels are activated by a decode mechanism. This decodes the address (on the address channel) or the tag (on the data and response channels) to activate the addressed receiver for the channel. A channel can only have one active receiver at any one time.
The decoder raises the appropriate select signal on completion of the decode, and must lower the select line once the channels acknowledge has been raised. The channel receiver must wait until the select line is low before lowering its acknowledge.
The decoders may be centralized or distributed, but decode times should be as low as possible since a decoder is on the critical path of every cycle on a channel.
Address Decode Aborts
If an address is not within a mapped range, the address decoder should activate an abort handler as the target device. This abort handler should complete the address cycle, and send an abort indication on the abort channel, performing the data activity expected by the initiator, but not changing any data values on the data channel.
The abort handler may also be activated if the address-decoder checks for the correct transfer privileges or memory system alignment, but it is more likely that these will be detected by the target device itself.
Multi-packet Transfers
Atomic Transactions
In certain situations such as atomic swap instructions it is necessary for a processor to be able to lock a bus in order to obtain exclusive access to a target device for multiple consecutive cycles. These operations are supported in STAMINA through the initiator interface keeping the arbitration request asserted until the final address cycle of the atomic transfer has started.
Burst Transfers
Many buses provide a burst-mode allowing faster cycle times (faster transfers) for the packets in a burst from one device to another. The arbitration locking mechanism described above could be used to achieve a burst-sequence by avoiding the address-arbitration stage for all but the first cycle of the burst. However, use of such a technique is discouraged since it impacts on the latency of the other initiators within the system. It is better to re-arbitrate for every cycle since the arbitration can be performed in parallel with the return to zero phase of the cycle, thus it should be fully overlapped, incurring negligible impact on the cycle time of the sequence of transfers, but allowing other initiators to still obtain access to the bus if required.
Signal Description
This section provides detailed information on STAMINA signals including timing requirements and intended uses. All tristateable STAMINA signals shall have negative resistance termination (weak-feedback charge retention) to prevent them from floating.
Reset
SNRES - The active low reset line is used to reset the bus, and all bus-decoders, arbiters and bus-interfaces.
Address Channel
The address channel conveys the address-time information in its push phase and returns the defer status in the pull phase.
Arbitration
SAarbreqn
SAarbreqn | Description |
---|
0 | Initiator n does not wish to start a cycle on the bus |
1 | Initiator n requires access to the bus |
Each initiator has a dedicated address arbitration request line which it asserts to request access to the address channel. The request is received by the address arbiter and must be held asserted by the initiator until after the next rising edge of SAA. The SArbreqn signal must be lowered when further transfers are not required immediately subsequent to the current transfer. If the initiator requires multiple uninterrupted (address) transfers, as would be required for atomic read-modify-write operation then the SAarbreq signal may be held high for as many cycles as necessary, although this clearly has adverse effects on bus access latency for other devices if long atomic transfers are performed. Once raised, SAarbreqn must not be lowered until the corresponding SAarbgntn has been risen.
SAarbgntn
SAarbgntn | Description |
---|
0 | Initiator n is not allowed to initiate address cycles |
1 | Initiator n may use the address channel once it is idle |
Each initiator has a decicated address arbitration grant line which is driven by the address arbiter in response to the corresponding SAarbreqn line. When active this signal informs the initiator that it has been granted permission to use the address channel once it is idle, i.e. it is an "early grant". Thus, an initiator may only start a cycle on the address channel when its SAarbreqn and SAarbgntn lines are both high and SAR=SAA=0.
Decode
SASn
SASn | Description |
---|
0 | Target n should ignore the address channel |
1 | Target n should respond to the current address cycle |
The address decoder decodes valid addresses on the address channel and asserts one Address Select line to activate a target device during the time when SAR=1 and SAA=0.
Signalling
SAR
SAR | Description |
---|
0->1 | Valid pushed information is on the address channel push bundled lines |
1->0 | The address channel push bundled lines are invalid and pulled information can be removed |
Address Request is driven by the initiator. It is used to indicate that a valid address command has been placed on the address channel and its routing within the bus must be such that it satisfies the bundled data constraint. This signal forms the request part of a 4-phase signalling mechanism with the SAA signal.
SAA
SAA | Description |
---|
0->1 | The pushed information may be removed, and pulled information is now on the bus |
1->0 | The pulled information has been removed, and this address cycle is now completed |
Address Acknowledge is driven by the target. It is used to indicate that any pushed information is no longer required and that pulled information is available. The rising edge of the SAA will come in response to a rising SASn (which follows rising SAR), and SAA cannot be lowered until both SAR and the SASn for this initiator are low.
Information
SAT[n-1:0]
The n-bit Address Tag is used to identify which initiator the address cycle was initiated by. This tag is n bits wide where n=log2(number of initiators on the bus). If the bus only has one initiator then this field of the address channel has zero wires (i.e. not used). The SAT is bundled during the push phase of the address cycle.
SAC[n-1:0]
The n-bit Address Colour, used by the initiator to reorder incoming data and response packets, and to send the correct write-data so that the correct information is transferred between the correct devices in the correct order. Here, n=log2(maximum number of outstanding addresses allowed). If only one outstanding address is allowed, then this field of the address channel has zero wires (i.e. not used). The SAC is bundled during the push phase of the address cycle.
SAO
SAO | Description |
---|
0 | Write to target |
1 | Read from target |
The Address Read-Not-Write line indicates the direction of the transfer that is required. It is bundled during the push phase of the address cycle.
SA[n-1:0]
The n-bit address is bundled during the push phase of the address cycle and indicates the location that should be read or written.
SSIZE[n-1]
The n-bit size, bundled during the push phase of the address cycle is used to indicate the size of the data packet to be transferred. The packet size must fit on the data bus since STAMINA transfers data in one cycle per transfer. A suggested size is 2-bits, as used in MARBLE with the encoding:
SSIZE[1:0] | Description |
---|
00 | BYTE - 8 bit transfer |
01 | HALFWORD - 16 bit transfer |
10 | unused |
11 | WORD - 32 bit transfer |
SPROT[n-1:0]
The n-bit protection code may be used by the initiator to provide additional privilege information related to the transfer. This information is primarily intended for use by a bus protection unit within the target and the majority of bus targets will not use these signals. Upon detection of an attempted access violation, the target will signal an abort in its response cycle. SPROT is pushed on the address channel.
A suggested size is 2-bits with the encoding:
SPROT[1:0] | Description |
---|
-0 | Opcode Fetch |
-1 | Data Access |
0- | Supervisor Access |
1- | User Access |
SSEQ
SSEQ | Description |
---|
0 | The address relationship is non-sequential |
1 | The address relationship is sequential |
The Sequential signal can be used to indicate a relationship between two addresses, and is pushed on the address channel.
SPRED
PRED | Description |
---|
0 | SSEQ relates this address to the previous address from the current initiator |
1 | SSEQ relates the next address from the current initiator to this address, and the next address from this initiator will be in the same 2^n address-region as this, where n is system specific |
The Predictor signal is used to determine whether SSEQ is giving a current-to-last address relationship or a predictive relationship between the next address and the current address.
SA[n-1:0]
The n-bit wide address that is to be read from or written to, transmitted during the push phase of the address cycle. These signals are driven by the initiator and are also used by the address decoder to determine which target should respond to the cycle. SPRED is pushed on the address channel.
SDEF
SDEF | Description |
---|
0 | The transfer is being serviced |
1 | The transfer must be deferred until later. The initiator should release the address channel. |
The Defer signal, pulled on the address channel is used to indicate that the transfer must be deferred until later. The initiator should release the address channel (even if it is locked as part of an atomic transfer) and should then retry the operation. Transfers should only be deferred if absolutely necessary since repeatedly deferring and then retrying wastes power and could impact the bandwidth use of the bus. It is recommended that the only use of defer is when a device that has both and initiator and a target interface (such as a bridge to another bus) is being addressed on its target interface but cannot complete the transfer because it needs to perform an action as an initiator on the bus. In this way, with fair arbiters there is a worst case number of retries dependent on the shape and size of the address channel arbiter tree. Defer thus allows the avoidance of deadlocks when initiators on either side of a bridge try to communicate with targets on the opposite side of the bridge, in which case the bridge must defer one initiator.
Data Channel
SDarbreqn
SDarbreqn | Description |
---|
0 | Target n does not wish to start a data cycle |
1 | Target n requires access to the data channel |
Each target has a dedicated data arbitration request line which it asserts to request access to the data channel. The request is received by the data arbiter and must be held asserted by the target until the next rising edge of SDA for a cycle involving this target, aftwer which it must be lowered. Once raised, SDarbreqn must not be lowered until the corresponding SDarbgntn has been risen.
SDarbgntn
SDarbgntn | Description |
---|
0 | Target n is not allowed to start data cycles |
1 | Target n may use the data channel once it is idle |
Each target has a decicated data arbitration grant line which is driven by the data arbiter in response to the corresponding SDarbreqn line. When active this signal informs the target that it has been granted permission to use the data channel once it is idle, i.e. it is an "early grant". Thus, a target may only start a cycle on the data channel when its SDarbreqn and SDarbgntn lines are both high and SDR=SDA=0.
Decode
SDSn
SDSn | Description |
---|
0 | Initiator n should ignore the data channel |
1 | Initiator n should respond to the current data cycle |
The data decoder decodes valid tags on the data channel and asserts one Data Select line to activate an initiator device during the time when SDR=1 and SDA=0. The initiator activated is the one whose unique id is the same as the SDT.
Signalling
SDR
SDR | Description |
---|
0->1 | Valid pushed information is on the data channel push bundled lines |
1->0 | The data channel push bundled lines are invalid and pulled information can be removed |
Data Request is driven by the target. Its routing within the bus must be such that it satisfies the bundled data constraint for the data channel. This signal forms the request part of a 4-phase signalling mechanism with the SDA signal.
SDA
SDA | Description |
---|
0->1 | The pushed information may be removed, and pulled information is now on the bus |
1->0 | The pulled information has been removed, and this data cycle is now completed |
Data Acknowledge is driven by the initiator. It is used to indicate that any pushed information is no longer required and that pulled information is available. The rising edge of the SDA will come in response to a rising SDSn (which follows rising SDR), and SDA cannot be lowered until both SDR and the SDSn for this initiator are low.
Information
SDT[n-1:0]
The n-bit Data Tag is used to identify which initiator the data cycle is destined for. This tag is n bits wide where n=log2(number of initiators on the bus). If the bus only has one initiator then this field of the data channel has zero wires (i.e. not used). The SDT is bundled during the push phase of the data cycle.
SDC[n-1:0]
The n-bit Data Colour, used by the initiator to reorder incoming data packets, and to send the correct write-data so that the correct information is transferred between the correct devices in the correct order. Here, n=log2(maximum number of outstanding addresses allowed). If only one outstanding address is allowed, then this field of the data channel has zero wires (i.e. not used). The SDC is bundled during the push phase of the data cycle.
SDO
SDO | Description |
---|
0 | Write to target |
1 | Read from target |
The Data Read-Not-Write line indicates the direction of the transfer that is being serviced on the data channel. It is bundled during the push phase of the data cycle.
SD[n-1:0]
The n-bit data is bundled during the push phase of the data cycle for a read and pulled during the data cycle for a write.
Response Channel
Arbitration
SRarbreqn
SRarbreqn | Description |
---|
0 | Target n does not wish to start a response cycle |
1 | Target n requires access to the response channel |
Each target has a dedicated response arbitration request line which it asserts to request access to the response channel. The request is received by the response arbiter and must be held asserted by the target until the next rising edge of SRA for a cycle involving this target, after which it must be lowered. Once raised, SRarbreqn must not be lowered until the corresponding SRarbgntn has been risen.
SRarbgntn
SRarbgntn | Description |
---|
0 | Target n is not allowed to start response cycles |
1 | Target n may use the response channel once it is idle |
Each target has a decicated response arbitration grant line which is driven by the response arbiter in response to the corresponding SRarbreqn line. When active this signal informs the target that it has been granted permission to use the response channel once it is idle, i.e. it is an "early grant". Thus, a target may only start a cycle on the response channel when its SRarbreqn and SRarbgntn lines are both high and SRR=SRA=0.
Decode
SRSn
SRSn | Description |
---|
0 | Initiator n should ignore the response channel |
1 | Initiator n should respond to the current response cycle |
The response decoder decodes valid tags on the response channel and asserts one Response Select line to activate an initiator device during the time when SRR=1 and SRA=0. The initiator activated is the one whose unique id is the same as the SRT.
Signalling
SDR
SRR | Description |
---|
0->1 | Valid pushed information is on the response channel push bundled lines |
1->0 | The response channel push bundled lines are invalid and pulled information can be removed |
Response Request is driven by the target. Its routing within the bus must be such that it satisfies the bundled data constraint for the response channel. This signal forms the request part of a 4-phase signalling mechanism with the SRA signal.
SRA
SRA | Description |
---|
0->1 | The pushed information may be removed, and pulled information is now on the bus |
1->0 | The pulled information has been removed, and this response cycle is now completed |
Response Acknowledge is driven by the initiator. It is used to indicate that any pushed information is no longer required and that pulled information is available. The rising edge of the SRA will come in response to a rising SRSn (which follows rising SRR), and SRA cannot be lowered until both SRR and the SRSn for this initiator are low.
Information
SRT[n-1:0]
The n-bit Response Tag is used to identify which initiator the response cycle is destined for. This tag is n bits wide where n=log2(number of initiators on the bus). If the bus only has one initiator then this field of the data channel has zero wires (i.e. not used). The SRT is bundled during the push phase of the response cycle.
SRC[n-1:0]
The n-bit Response Colour, used by the initiator to reorder incoming response packets, so that the correct information is transferred between the correct devices in the correct order. Here, n=log2(maximum number of outstanding addresses allowed). If only one outstanding address is allowed, then this field of the respone channel has zero wires (i.e. not used). The SRC is bundled during the push phase of the response cycle.
SR[n-1:0]
The n-bit response is bundled during the push phase of the response cycle. The response can be used to indicate errors such as memory aborts and invalid privileges or bad operations.
Scalable Complexity
STAMINA offers scalable complexity to suit the end system requirements by trading hardware for performance/complexity.
Merged Address and/or Data and/or Response
It is expected that many systems will not require late abort support, or will only require it infrequently. In these systems the overhead of having a separate response channel can be avoided by bundling the abort-response in the push phase of the data cycle. This then only impacts the write latency, and if this is critcal for a device with late-abort requirements, support could be added for sending the abort in a second cycle on the data channel, but this could impact on the data channel bandwidth and make the data interfaces more complicated, so it may be simpler to revert back to a separate response channel. This tread-off has yet to be investigated.
In a similar manner, if the system can cope with less than half the bandwidth provided by a separate address and data channel STAMINA bus then the address and data channels could be merged to use the same set of wires, requiring an additional bit to indicate whether the transfer is an address or data transfer. Clearly the address and data arbiters would then be merged into one arbiter tree.
No Responses
Many systems may not care about memory aborts, and will not need abort support from the bus. The response hardware is thus not necessary in many cases.
Shared Arbiters
One arbiter input is required for each of the data and response channels by the targets. However, this arbiter requirement can be reduced by removing the decoupling between target address and data/response interfaces. This then means that the address cycle on the bus is tightly coupled with the data/response cycle, i.e. there is no pipelining of transfers on the bus. This clearly means that of all the tightly coupled targets, only one will be active at once and so there is no need to arbitrate between them, i.e. they can all share an arbitration channel. In the extreme case, with all targets coupled, this reduces to having no data/response channel arbiters, and activity on the data/response channel will always be encapsulated by activity on the address channel.
Number of Colours (& Outstanding Addresses)
The width of the colour fields of the bus channels is log2(max number of outstanding addresses). With all initiators only issueing one outstanding address then no colours are required. 2 outstanding addresses requires on colour bit, upto 4 outstanding transfers requires 2 colour bit etc.
Sequential Optimisation Support
The SSEQ and SPRED signals convey sequentiality optimisation relating consecutive transfers from the same initiator. The encoding on these lines is such that the information below can be transferred.
SPRED | SSEQ | Meaning | Typical use |
---|
0 | 0 | THIS address is non-sequential to the previous address | Non-sequtnail memory access, Branch Target Instruction Fetch |
0 | 1 | THIS address is sequntial to the previous address | Instruction Fetch |
1 | 0 | The NEXT address will be non-sequential to the current address, but is within the same 2^n page | Cache line fetch (wrap-around), DMA Transfer |
1 | 1 | The NEXT address will be sequential to the current address and in the same 2^n page | DMA Transfer,ARM LSM intructions, Cache line fetch (non wrap-around) |
This information must be filtered at the target to determine whether it is valid or not since, unless the bus is locked in an atomic transfer, two consecutive transfers from an initiator may have a transfer to the same target from a different initiator between them, thus any sequential information would be invalid as seen by the target device.