Sparse Arrays

Introduction

LARD's sparse arrays are similar to Perl's associative arrays and tcl's arrays - though unlike those structures, sparse arrays are limited to integer indexes. But don't worry if that doesn't mean anything to you...

A sparse array is like a normal array in that it stores a collection of data, where all elements have the same type (called the base type), and the elements are identified by numerical indices. However in a normal array it is necessary to set bounds on the size of the indices when the array is declared. Furthermore the memory needed by all of the potential elements is reserved immediately. In contrast in a sparse array there is no need to specify bounds for the indices, and memory is handled efficiently with whatever distribution of indices is used.

Naturally this advantage comes at a price; sparse array accesses are slightly slower than normal array accesses (though not by enough to really worry about), and there is a significant memory overhead. In the LARD implementation there is also the need to call an initialisation function (like channels and strings) and the restriction that sparse arrays cannot be passed in their entirety as val parameters to functions.

Sparse arrays are intended for applications such as memory models.

The Details

sparse_array is a parameterised type which takes the base type as a parameter. A sparse array is declared using a normal var declaration, but it must be followed by a call to init_sparse_array before you do anything with it:

	A:var(sparse_array(int)).
	init_sparse_array(A);

There is also a destroy_sparse_array function which frees the memory used by a sparse array when you have finished with it. You should call this at the end of the scope of the array, though this isn't really necessary if the program is about to finish anyway.

The base type of the array can be any type (though I would worry about nesting one sparse array within another, and I haven't tried sparse arrays of strings). You can also embed sparse arrays within other structured types such as arrays or records. The index of the sparse array is always an integer.

Indexing is done using [ ], as for normal arrays. The [ ] expression can appear on the left or right of an assignment.

	A[99]:=i;	/* Set element 99 to the value of i. */
			/* If element 99 didn't exist it is created. */
			/* If it did exist the old value is replaced. */

	f:=A[99];	/* Set f to the value of element 99. */
			/* If element 99 didn't exist the result is undefined. */

Example

Here's part of the memory model from the STUMP processor implemented using a sparse array:

/* Simplified STUMP memory model */

/* declare the memory as a sparse array */
mem:var(sparse_array(int)).

/* memory interface: addrctrlstdata is a bundle of address, load/store
   bit, and store data.  lddata is a channel for returned load
   data. */

dmem(addrctrlstdata:var(chan(daddrctrlstdata_t)),
     lddata:var(chan(int))):expr(void)=(

  forever(
    addrctrlstdata?(
      wait_for(50);
      if (?addrctrlstdata->store) then (
	/* store to memory: write to the memory array */
        mem[?addrctrlstdata->addr]:=?addrctrlstdata->stdata
      ) else (
	/* load from memory: read from the memory array */
        lddata!(mem[?addrctrlstdata->addr])
      )
    )
  )
).

/* memory initialisation from file */

meminit(fn:val(string)):expr(void)=(
  
  f:var(tfile).
  a:var(int).
  
  newstringvalparam(fn);
  
  f:=topenin(fn);
  set_default_input_base(2);
  a:=0;
  while (!eof(f)) do (
    mem[a]:=freadint(f);
    a:=a+1
  );
  close(f);
  
  endstringvalparam(fn)
).

/* initialise the memory sparse array */
init_sparse_array(mem);

meminit("progs/gcd.bin")

Further thoughts

There is a slight concern that reading from an undefined index causes that address to be defined and memory is allocated for it. In some applications this could defeat the idea of having a sparse array; let me know if this is a problem for you and I'll think about work-arounds.

There is also at present no way to test if a particular index is used, or to get a list of or iterate over the indexes that are used, or to delete an element. These are all things that are provided in the Perl and tcl implementations. If any of these functions would be useful to you please let me know which.

Sparse arrays are implemented using a hashing mechanism, the efficient operation of which depends on various bits of probability. If this code seems to run very slowly with your data, please let me know and I'll look at the hash function statistics.