Data I/O Formatting in LARD

This section describes the format of printf(3) style text formatting strings which may be used with the functions init_chan, formatdata and parsedata to read and write typed LARD values in more conventient formats.

As an example:

	t : type = record ( a :: int, b :: bool ) .
	v : var(t) .
	v := {10,false} ;	
	print (formatdata ("{%d,%e(false,true)}", v)) 

This code creates a variable v which has a record type consisting of two words a and b. The call to formatdata returns a string which represents the value in v using the format description "{%d,%e(false,true)}". This code will actually print {10,false} which is a faithful representation of the value assigned to v.

The general form of a format description string is (using EBNF):

	<format-string> ::= ([<verbatim>] <format>)* [<verbatim>];
	
	<verbatim> ::= <any-char-except-%>*
	
	<format> ::= `%' [<repeat>] [<flags>] [<minimum-width>] [<address>] [<qualifier>] <format-char>
	
	<repeat> ::= `*' <any-char-except-(>
		|	`*' `(' <sep-string> `)'
	
	<sep-string> ::= <any-char-except-)>*
	
	<flags> ::= [`!'][`='][`-'][`0']
	
	<address> ::= `(' <word-no> `,' <bit-range> `)'
		|	`(' <word-no> `)'
		|	`(' <bit-range> `)'

	<bit-range> ::= <left-index> `:' <right-index>
	
	<word-no>, <left-index>, <right-index> ::= an integer >= 0
	<minimum-width> ::= an integer >= 1

	<qualifier> ::= `m'
	
	<format-char> ::= `d' | `o' | `x' | `X' | `b' | `u' | `s' | `c' | `r' | `%' | `e' <enum-elems>

	<enum-elems> ::= `(' <enum-name> [ `=' <enum-value> ] ( `,' <enum-name> [ `=' <enum-value> ] )* ')'

	<enum-name> ::= <any-char-except-(-,-=>*
	
	<enum-value> ::= an integer

The Argument Pointer

A format string contains a number of format descriptions separated by any amount of verbatim text. Unless modified by the `!' flag, a word address or a repat clause each format description applies to the next argument word to the function processing the format. So formatdata ("%d %d", [10,20]) will produce the string "10 20". An argument pointer exists which, starting at the first argument word at index 0, advances through the argument list by one place after each format description. The `!' flag can be used to prevent the argument pointer from advancing to the next argument so: formatdata ("%!d %d %d", [10,20]) returns the string "10 10 20". If the argument pointer advances past the end of the argument array then the rest of the format string is returned verbatim.

Addressing and Bit Fields

Arbitrarily placed argument words can be formatted using an address clause to a format description. formatdata ("%(1)d %d %(0)d", [10,20,30]) produces "20 30 10" using the arguments at indices 1,2,0 for the three formats. Each time a format description uses a word address clause the argument pointer is set to point at that argument and will advance to the next argument (unless the `!' flag is specified) for the next format. An argument address can also like (0,3:1) or (3:1). With three arguments we speciy the word and bit range within that word of the argument and with just two arguments the bit range of the current argument word. If a word beyond the end of the argument array is indexed an error is raised.

The Flags

The `!' flag affects addressing as previously described. The `-' flag is used to suggest that a bit field extracted from an argument word should be sign extended before printing and the `0' flag results in zero padding of the printed result to a minimum width just like a C format.

Minimum Width

The minimum width clause allows zero or space padded fixed (for suitably short/small arguments) width output. Formats which are subject to a minimum width are always right justified in a field of that width or longer.

The `m' Qualifier and the Format Characters

Bit fields and the `!' flag are particularly useful for models where bit-packed records are held in single mpint objects. The format "{%!(7:4)md,%!-(3:0)md}" will print two bit fields from a single argument word, in curly brackets, separated by a comma. The brackets and comma are specified by their verbatim inclusion between the formats "%!(7:4)md" and "%!-(3:0)md". The two printed values are bit ranges 7 to 4 and 3 to 0 (sign extended from a sign bit in bit 3) and are extracted from an mpint rather than a raw word (== an int) because of the `m' qualifier on the format character `d'. The `=' flag can be useful when parsing bit fields. The flag sets the target word to zero (as an int or mpint) before copying in the bit field value. Giving the `=' flag with the first format description for each target word removes the need for the user to provide a zeroed target word for bit field insertion.

The format characters have very similar meanings to those of C/tcl format strings. `d' prints a decimal (possibly signed) integer, `o' prints unsigned octal, `x' hexadecimal, `X' hexadecimal in upper case, `b' binary. The `u' format prints a decimal integer as an unsigned value but way produce an undefined result for a sign extended mpint. The format characters `s', `c' and `%' print strings represented by LARD string objects, characters using the least significant 8 bits of the addressed word and the verbatim character % respectively. Use of the `-' flag will not affect `d' format operations on int values and the use of bit fields will raise an error from the interpreter for `s' formats. No attempt is made to verify the type of object pointed to by an argument word so the `m' qualifier, `s' format character and bit extractions must be used carefully. The `i' format when used with formatdata prints in decimal but with parsedata allows reading of a LARD constant with or without the radix prefix using the current default radix (e.g. 16#45, 69 decimal expressed in base 16). The default `i' radix is set by using the `r' flag which reads an integer indicating the default radix from the current result word. A negative value for the radix indicates that the value may be signed, giving a radix of 0 indicates that numbers are expected to be decimal but radix prefices are recognised. As of version 2.0.13, `normal' C style radix prefices are recognised by parsedata as well as the LARD style # prefix. In addition to the hex `0x' prefix a `0b' prefix can be used to parse binary numbers. Note that due to the inconvience of the 0 prefixed octal number format and its incompatability with LARD's number notation, this form of prefix is only recognised when using the special 0 radix. Radix 0 is the default for `%i' parses. The function s2i with a given index is implemented by a combination of %r and %i.

The `e' Enumeration Format

The final formatting character is `e' which is used for printing symbolic names for enumeration type values. In LARD these kind of types can be defined in the following way:

	myEnum : type = scalar . /* Make a word sized type myEnum */
	elemA : val(myEnum) .    /* elemA is an element of myEnum with raw value 0 */
	elemB : val(myEnum) .    /* elemB is an element of myEnum with raw value 1 */

The builting bool type is defined in a similar way. A suitable `e' format format string for this type would be "%e(elemA,elemB)" where the `e' format takes a round bracketted argument list of element names taking indices starting at zero and advancing one for each element name. The element name is the string to be returned for this format description on the current value and may contain the usual identifier characters. An optional element value can be given which is a signed integer value to which the element name is associated. Giving an explicit element value resets the implicit element value counter giving subsequent implicitly numbered elements contiguous values. Using the letter `e' as the value or the last element in the list identifies that at the `else' element, the identifier to print when no other match is found. `Else' elements return a value of 0 when matched in parsing using the `%e' format. If more than on name has the same value then only the first of these names will ever be returned for that value.

Invalid enumeration values are printed as decimal integers preceeded by "10#" by formatdata and can be parsed by parsedata if given as a string starting with a digit. Consequently only identifiers which do not begin with a digit can be used. Matching for enumeration strings is done by comparing each string in turn with the first characters of the argument string (to the length of the enumeration element identifier) and selecting the element which matches the longest string.

The Repeat Clause

The default display format in the LARD channel viewer is `%* d' which means repeat the format "%d" for each of the given arguments and separate those formatted arguments with " ". A format description with a repeat clause can be given at the end of any format string and will apply to each of the arguments in turn from the current argument to the end of the argument array. The separator between arguments can either be descibed by the single (non `(') character after the `*' or by a string inside round brackets following the `*'. The separator string is returned verbatim but probably should include a mechanism for displaying the argument name/index. Any text left after the end of the first repeated format description in the format string (which is presumably the last format description) will be returned verbatim. The example formatdata ("B {%*(, )2X} E", [7,8,9,10,11,12]) will return "B {07, 08, 09, 0A, 0B, 0C} E".