Feel free to ammend, adjust, comment and recommit.
This technical note looks at the current register window scheme used in the Jamaica architecture, the aim is to document what we are currently doing, and assess whether or not this is a sensible policy to be following for future architecture improvments.
The current implementation of registers within the jamaica cores have the following properties.
The following diagram gives a hierarchical overview of the notional structures in the Jamaica core registers.
The current cores have a register file containing 24 frames of 8 registers [192 registers per core]. In the maximum jamaica configuration, 16 cores, the chip consumes some 3072 registers.
The global registers in the current sim are one per processor, despite Greg's Thesis p.66.
Associated with each window in the register file is an entry in the window management table. Each entry holds 2 pointers to other windows.
prev: pointer to the previous window. next: pointer to the next window.This allows the registers to maintain a doubly-linked list, meaning that consecutive calls and returns need not be to contiguous register windows.
The entry also holds two 1-bit flags:
alloc: which is set to one if the window has
been allocated, i.e it is no longer a free window.
pfp: which is set to one if the prev frame is present in the physical registers.
Whilst there are 24 physical windows in the register file, at any moment in time, only 4 of these windows are visible to the current running context. In order for the context to know which of these windows are visible, it needs to maintain pointers to its windows.
For this reason each context maintains a set of registers containing the following
The only distinction between window and frame is that windows are visible and frames are physical.
In: the registers that contain local variables to the current function.(with the exception of %i7). Out: the registers used for assigning variables to the callee of the current function Extras: statically(?) assigned window containing the context specific information. never deallocated Globals: statically(?) assigned window containing the context specific information. never deallocated see greg's thesis 5.1, p66.
The current context maintains the PC, so that it can be restarted on a context switch
The current context maintains some pointers into memory, so that it can spill or fill registers should the registers over(under)flow.
base: The base address of this contexts spill area. limit: The maximum number of entries. count: The index of the next free entry in the area. last: A flag, set if this area is the last spill area, if more than one non-contiguous area exists.
All of the context specific information can be stored in the context table
Also, globals needed in a context since they are context specific. (see greg's thesis p66)
The most plausible benefit of using the register window scheme is that the calling and returning from functions can be done fairly quickly. The outs of the current function becoming the ins of the next function, and vice versa on the return.
On a function call to D:
1) In the current visible set, check if an out window has been allocated, if not then allocate one now. [if one can't be allocated the bottom window needs to be spilled to memory]. 2) in_frame := out_frame; 3) out_frame := (none); [lazy allocation scheme] 4) %i7 := PC + 4; [save next PC to allow a return] 5) PC := D; [jump to the next function location]
On a return:
1) If an out window has been allocated, then free it. 2) If in window is first in the thread, then terminate the thread, else 3) PC := %i7; 4) out_frame := in_frame; 5) in_frame := in_frame.prev. [filling from memory if necessary].
The allocation of frames is confusingly enough called window management in Greg's thesis!!. The Window management allocates new frames to contexts if they need to create a new out window. It also controls the freeing of a frame once the function has returned. In Greg's thesis, he discusses a separate free list that is maintained that contains the frames that have not been allocated. This overhead is probably unjustifiable in real architecture, and a CAM could be used to find a frame in the register file with the allocated bit not set. The free list would provide a more efficient simulation but deviates from the architecture.
Used when the current function needs to allocate an out window. Because out windows are allocated lazily to avoid uneccessary spills at the in the register file, this may not be called until the out window is addressed for the first time.
1) Find a new frame, F, where F.alloc = 0, if none exists, a window must be spilled. 2) F.prev = current in_frame, F.next := 0, F.pfp := 1, F.alloc := 1 3) current in_frame.next := F.
Used when ever a function returns and is able to destroy its out window.
1) F.prev.next := 0; [where F is the frame to free] 2) F.alloc := 0; [in the Greg's sim, this would be put back on the free list].
Spilling occurs whenever the register file runs out of free frames, meaning a new (probably out) window in the current context cannot be allocated. This means effectively that the nesting of function calls has delved quite deep, which with a method inlining compiler like Jikes, should not happen all that often. The potential for this to happen is increased however by the number of contexts on the core in question, and how many are active. Each context can own any of the frames in the register file, as all contexts share the same register file. If the maximum of 8 contexts per core is used, then 8 extras frames are statically assigned from the 24 windows, plus the global frame at the bottom of the register file, leaving 15 windows to play with between 8 contexts. Clearly in this case each context only has to be nested once and splilling and filling could grind the processor to a halt. Not sure whether we've managed to exercise this yet (?).
F is the frame being spilled, which is the bottom frame
1) If count >= limit, then raise an EXCEPTION else 2) write r0..r7 out to memory starting at base + 32 * count. 3) count++; 4) F.next.pfp := 0; 5) F.alloc := 0;
Filling occurs when the nested function return to the point at which early callers have been spilled into memory. These frames need to be brought back into memory into a free frame
N is the current bottom window, F is the one being filled
1) If count := 0, EXCEPTION as spill area is empty, else 2) count--; 3) F.next := N; F.alloc = 1; 4) if count = 0 and last = 1 then F.prev := 0, F.pfp := 1, else F.pfp := 0; 5) N.prev := F; N.pfp := 1; 6) read r0..r7 from memory into F starting at base + 32 * count
This is the section in which ideas should be thrown around about the necessity of register windows in future jamaica architectures. It is clear from some of the explanations that I've recieved on the use of the extra registers that we are already hacking around the register window structure to make life easier for ourselves.
If we have a large number of contexts, 8 are currently allowed on each processing core, and we attempt to utilise all of these contexts, is it not possible to get into the scenario where for a prolonged period each context is spilling on each entry into a new function, or do we presume that the number of function calls will remain fairly shallow?
In the current system, we choose which window to spill by following pointers from the current window (that is having problems allocating a new frame for out say), and trace back until we find a frame that has the pfp (previous frame present) flag set to zero, meaning that it is the bottom of this contexts window allocations. This window is then the window that is spilled in order to make space for the next window to be allocated.
One possible solution could be to retain, at the cost of one extra pointer per context, a pointer to the spillable frame. If a frame needs to be spilled the spill_frame pointer references the frame to be removed, the spill_frame.next.pfp flag is set to 0, and the spill_frame is set to spill_frame.next. This would save on a list traversal, but cost one more register per processor.
Tanenbaum, on register windows: "On the whole, this complexity is a big nuisance and probably more trouble than it is worth."
By Matthew Horsnell