The LARD Virtual Machine
The language output by the LARD compiler is an assembler-like
language. This document describes this language and the way in which
it is used in compiled LARD programs.
Programmer's Model
Storage
The virtual machine has three forms of storage:
- Registers: there is a small set of word-wide genral
purpose registers and a program counter.
- Stacks: there is a small number of unbounded-size stacks,
storing words.
- The Heap: there is an unbounded region of memory managed by
Allocate and Free operations.
Concurrency
Many threads may exist. Each thread has a private set of registers
and stacks, but the heap is shared. A new thread inherits copies of
its parent's registers and empty stacks. Threads are scheduled
non-preemptively.
Flow of control
The program counter is a normal register and branches are effected by
assigning a value to the PC.
Instruction set
The instruction set comprises these instructions:
- ALLOC R1 R2
-
Allocate space in the heap. R1 is number of words to allocate. R2 is
register to store address of block in.
- FREE R
-
Free block of memory pointed to by R.
- PUSH S R
-
Push value from R onto stack S.
- POP S R
-
Pop value from S into R.
- LOAD RA RD
-
Load value from address given by RA into register RD.
- STORE RA RD
-
Store value from register RD into address RA.
- SETN R N
-
Set register R to value N.
- SETS R S
-
Set register R to the displacement of string S into the string table.
- SETC R C
-
Set register R to character C.
- SETR R1 R2
-
Set register R1 to value from R2.
- ADD R1 R2
-
R1 := R1 + R2
- SUB R1 R2
-
R1 := R1 - R2
- MUL R1 R2
-
R1 := R1 * R2
- DIV R1 R2
-
R1 := R1 / R2
- MOD R1 R2
-
R1 := R1 mod R2
- NOT R1
-
R1 := ~R1
- AND R1 R2
-
R1 := R1 & R2
- OR R1 R2
-
R1 := R1 | R2
- XOR R1 R2
-
R1 := R1 ^ R2
- SHL R1 R2
-
R1 := R1 << R2
- SHR R1 R2
-
R1 := R1 >> R2
- INC R
-
R := R + 1
- DEC R
-
R := R - 1
- INC4 R
-
R := R + 4
- DEC4 R
-
R := R - 4
- CMPEQ R1 R2 R3
-
R1 := (R2=R3)
- CMPNEQ R1 R2 R3
-
R1 := (R2!=R3)
- CMPLT R1 R2 R3
-
R1 := (R2<R3)
- CMPLTEQ R1 R2 R3
-
R1 := (R2<=R3)
- CMPGT R1 R2 R3
-
R1 := (R2>R3)
- CMPGTEQ R1 R2 R3
-
R1 := (R2>=R3)
- SETNNZ R1 R2 N
-
Set register R2 to N if R1 is not zero
- BUILTIN B
-
Call the foreign function B. B's definition is statically linked into
the interpretter.
- END
-
Terminate program
- FORK R
-
Fork the current thread. In the parent thread, the thread number of
the child thread is stored in R. In the child thread, zero is stored
in R. The child thread's other registers are copied from the parent
thread. The child thread's stacks are empty.
- DIE
-
Terminate thread
- YIELD
-
Schedule another thread
- RUNCHECK R1 R2
-
R2 := (is thread R1 running?)
- LINE FN LN
-
Note current line number and file number
- DEBUG FMT ARGS...
- Send a message to the debugger. FMT is a string containing
conversion characters, as in printf. %reg takes a register and
inserts the contents of that register in decimal. %sreg takes a
register and inserts a string taken from the address given by the
register.
Use of the Virtual Machine
On top of this basic virtual machine, another level of abstraction is
provided by the particular use of registers and memory to accomodate
the compiled LARD program.
The registers are:
PC | The program counter.
|
T1..T8 | Temporary registers.
|
CXP | The "current context pointer". This is a pointer to
a structure in the heap (the "current context"); see later.
|
FPNUM | This thread's forpar number.
|
STRBASE | Address of start of string area.
|
TID | This thread's thread number.
|
The stacks are:
PS | The parameter stack. Used to pass parameters to
functions and to return their results.
|
RAS | The return address stack. Used to store subroutine
call return addresses.
|
CXS | The context stack. Used to store values of the CXP
register over subroutine calls.
|
Structures in the Heap:
The Current Context is a structure in the heap pointer to by
the CXP register with the following form:
pointer to context of static parent
pointer to first local
pointer to second local
...
So any local that is in scope can be found by following the parent
context pointers as far as necessary to find the current context for
the local where it is declared, and adding a displacement based on
which local it is within that context.
The entries in the current context are pointers to the actual values
themselves.
Parameter Passing Mechanism
Parameters passed on the PS have the following forms:
- Types
- The size of the type is passed.
- Values
- Values of the scalar types are stored in single stack entries.
Structured types are stored in a sequence of stack entries, described
later.
- Variables
- Parameters of class var() are passed as pointers to the
coresponding values.
- Expressions
- Parameters of class expr() are passed as a pair of stack
entities; the first pushed is the address of the code that evaluates
the expression and the second pushed is a pointer to the context
(i.e. CXP value) that should apply during its execution.
Parameters are popped in left-to-right order and so must be pushed in
right-to-left order. Any implicit parameters are pushed before the
explicit parameters that they corespond to.
Function Call Mechanism
The Caller Must:
- Push the parameters onto the PS;
- Push its context onto the CXS;
- Set the CXP to point to the static parent of the callee;
- Push its return address onto the RAS;
- Set the PC to the start of the callee.
The callee must:
- Allocate its CX;
- Set the parent pointer in the CX to passed CXP (its static parent);
- Set CXP to point to its CX;
- Pop each parameter from the PS:
- For type, val and expr parameters, it must allocate space to
store the parameter, copy the parameter from the PS into this
space, and set the appropriate entry in the CX to point to this
parameter;
- For val parameters it must copy the address of the parameter from
the PS into the appropriate entry of the CX.
Before returning, the callee must:
- The result must be pushed onto the PS;
- For type, val and expr parameters, it must free the space pointed
to by the CX entry.
- The CX must be freed;
- The PC must be loaded with the return address from the RAS.
On return, the caller must:
- pop its CXP from the CXS.
Body of a Function
Although the class of a function defintion is expr(something), this
does not mean that the execution of the body must return an
enevaluated expression! The function is only actually called when the
unevaluated expression is evaluated.
So the function must actually EVAL its body expression.
To evaluate an expression, the current context must be stored on the
CXS and a return address is stored on the RAS. The CXP is then loaded
with the context of the expression and the PC is loaded with the
address of the expression. When the expression completes the CXP
must be restored from the CXS.
Expressions and subexpressions
Expressions that have no locals simply
- Push their parameter exprssions onto the PS;
- Call their function.
Expressions that have locals
- Allocate a CX for themselves;
- Allocate space for their locals and make the CX entries point to
these spaces;
- Push their parameter expressions onto the PS;
- Call their function;
- Free the space used by the locals;
- Free the CX.
Val functions
Read is an example of a builtin "val" function. Its implementation is
hardcoded in core.icode.pre. Val functions are called as follows:
- The parameters are pushed onto the PS;
- The function is called.
Structured Types
Vectors
A vector is represented by the representations of each of the elements
of the array in turn.
Records
A record is represented by the representations of each of the fields
of the record in turn.