The Art of Programming
A specification language: Value representation

By Frans

Introduction

This page describes a way for presenting values (with all their structure) in a character based, readable manner, much like XML. However, the value representation given here avoids many of the drawbacks of XML. (Read XML sucks. See also XML alternatives.) It overcomes the following drawbacks of XML:

Explicit distinction between different kind of values, e.g. booleans, numbers and strings.
Explicit distinction between sets and lists.
Explicit distinction between identifying and non-identifying properties.
Explicit mechanism for defining internal references.
Allows complete typing of the values, instead of through an external defined schema.
Is more restrictive in the number of ways data can be represented. (Does not have the attribute versus content choice.)
Value representation is more compact, without sacrificing human and automatic readability.

It should be noted that some ad-hoc decision have been made with respect to the representation of string values. Also the representation is based on the ASCII character set.

This page is part of the specification language presented on my "The Art of Programming" pages. The values describe here match the type definitions given in description of the static aspects of the language. Another pages deals with the dynamic aspects of the language.

Elementairy values

For the representation of the elementairy types we use the following conventions:

For the void type, the void value is represented by the string "void".
For the boolean type, the true and false values are represented by the strings "true" and "false".
The values of the various numeric types (like Num, NatNum, and PosNum) are represented by strings consting of the digits "0" to "9", prefixed with the minus ("-") character for negative numbers.
Rational values (of the RatNum) are represented by two numbers separated by the slash ("/") character, where the second number cannot have a prefixing minus character.
As an alternative, they can also represented by the usual 10-based fraction representation with exponent, which consist of an optional minus character, a sequence of digits, a period (".") character, a sequence of digits, and an optional exponent expresion, which start with the "e" character, followed by an optional minus character, and a sequence of digits.
The real numbers cannot be represented with a finite representation, otherwise they would not be real numbers. That means that they cannot be represented explicitely.
Values for the enumerate types are simply represented by an identifier consisting of the usual alphabetic and numerical characters and the underscore ("_") character, but may not start with a numerical character.

String values

String values represent arbitrary lenght sequences of ASCII characters. We use the string representation as used in the C language, with a simple extention. In the C programming languages, strings consist of one or more parts, where each part consists of ASCII characters with escape sequences surrounded by double quotes. An arbritrary amouth of white-space (spaces, tab-characters, newline-characters) may separate the parts. The extenstion we suggest consists of an alternative part form, in which each ASCII character is represented by two hexadecimal characters, surrounded by hash ('#') characters. The two forms can be arbitrary intermixed.

Set and list values

The various set and list values are represented by a sequence of comma separated values surrounded by brackets and preceded by one of the keywords "set", "orderedset", and "list".

Examples are:

  set(3, 4, 5)
  list(set(3), set(5), set(5))
  orderedset(list(2), list(3, 4))

Record values

A record value is represented by a sequence of comma separated pairs of identifier value pairs surrounded by brackets and preceded by the keyword "record". Each identifier value pair consists of an identifier and a value, separated by a colon (":") character.

Examples are:

  record(name:"Tom", age:34)
  record(a:4, b:3, c:record(v:1, w:3))

Function and map values

One cannot really make a distinction between function and map types without knowing the type of the domain. For this reason function values are represented as map values. A map value is represented by a sequence of comma separated individual mappings surrounded by brackets and preceded by the keyword "map". Each individual value consists of a pair of values from the domain and codomain separated by "->" and surrounded by brackets. The same short-hand (with respect to record types) that was introduced for the function and map types, will also apply to the map values.

Examples are:

  map((name:"Tom" -> age:34),
      (name:"John" -> age;27))
  map((4 -> 5),(6 -> 7))
  map((list(3,4) -> a:set(5,6)),
      (list(3,5) -> a:set(7,8)))

Relation values

Relation values can simply be seen as a set of records. The keyword "relation" will be used.

An example is:

  relation((a:4,b:5),(a:7,b:8))

Named values

Named values are represented by a pair of an identifier and a value, separated by a comma, surrounded by brackets, and preceded by the keyword "named". For named records the keyword "namedrecord" may be used, where the first element within the brackets represents the name.

Examples are:

  named(a,3)
  namedrecord(person,name:"John",surname:"Smith")

References

There are two ways in which references can be defined in a value. One is by means of reference labels. The other is by means of path expression.

Referencing through labels

Referencing through lables is simply done by labling a certain value with a unique label, and using this label at the place where the value is referenced. Any value can be labeled by post-fixing it with the "@" symbol and a unique identifier or number. A labeled value is referenced by the "#" symbol followed by a unique identifier. In case a referenced value is itself a reference, that reference should be followed recursively.

It should be noted that the labels are not part of the value itself, they are only used for the referencing. One can change any label into another unique identifier (and all references to it), without changing the value being represented by the value representation.

A disadvantage of the use of unique identifiers as lables, is that it introduces a semantical correctness criteria on the representation of values. If the unique identifier does not exist in the value representation or if it is not unique, it is rendered as void.) Also in the case of self referencing (such as for example in #a@a) this is considered as equal to void.

Referencing through path expressions

Path expressions prevent the need for unique identificating labels. Each path expression start with a number of "^" symbols that indicate the number of "levels" to go up, possibly followed by an expression selecting a part of the value at that level. The levels are simply defined by the surrounding brackets, meaning that each "^" symbol stands for one open bracket. Depending on the type of the value, different mechanism for selecting are used. If the value is a record, field selection can be used, which consists of a period followed by the field name. If the value is a list type, or some ordered type, an indexed based selection can be used, which consists of a positive number, surrounded by square brackets. If the value is a map type, selection based on a domain value between round brackets can be used. Again, the following short-hand notation can be used, in case the domain is a record type:

    (id₁:v₁, ..., id_n:v_n)

In this "id_i" stands for an identifier, and "id_i" stands for a value expression. The value selected in this manner, is the codomain part of the selected element in the map value, except when the reference expression occurs directly inside a map value expression.

An example is:

  record(
    X : list(map((c:3 -> d:7),
                 (c:4 -> d:5)),
             map((c:4 -> d:2))),
    Y : set(^^.X[1](c:3).d),
    Z : map(^^.X[2](c:4)))

In this example "^^.X[1](c:3).d" stands for the value 7. In this case, it would have be shorter to just have written 7, which would have represented exactly the same value. In this example "^^.X[2](c:4)" stands for the value "(c:4 -> d:2)".

Complete typing

Each value expression can be followed by a type definition as described in the static aspects of the specification language. This is done as by following the value by a colon and the type specification.

An example is:

  set(3,4,9) : Set(PosNum)

Although it is obvious that the above set contains only numbers, the explicit typing states that it may only contain positive numbers.

The problem with complete typing is that it introduces a semantical correctness criteria on the represented value. This requires a interpretation of what the value represents in case it does not match the given type expression. There is no simple way out. Consider the following examples:

  set(-1,4,5) : Set(PosNum)
  -1 : PosNum

For the first example, one could introduce the rule that states that the represented value is equal to set(4,5) or that it is equal to void. However, such a rule cannot be deviced for the second example, because void is not in the type PosNum.

The Art of Programming

The Art of Programming A specification language: Value representation

Introduction

Elementairy values

String values

Set and list values

Record values

Function and map values

Relation values

Named values

References

Referencing through labels

Referencing through path expressions

Complete typing

The Art of Programming
A specification language: Value representation