Specification and Implementation of
an Attributed Abstract Program Tree
Evaluator Generator

University of Twente, Department of Computer Science,
P.O. Box 217, 7500 AE Enschede, The Netherlands.

Abstract. Abstract program trees are a formalism to describe the internal representation of programs during compilation. Attributes can be attached to abstract program trees.

The syntax and the semantics of attributes abstract program tree grammars is presented. A generator that constructs an attribute evaluator from these grammars using hte Simple Multi-Pass evaluation strategy is described.

Introduction

In this paper we describe the syntax and the semantics of grammars to specify the evaluation of attributes of abstract program trees. We also indicate how such a description can be used to generate an attribute evaluator which uses the Simple Multi-Pass evaluation strategy [Alblas81,85,Eng84]. This generator is written in Pascal, and it generates an attribute evaluator written in Pascal.

In the first chapter the syntax of the grammars and an informal description of their semantics is given. This chapter also contains a complete example.

In the second chapter we give a more formal definition of the semantics, by constructing sets, functions and relations from a given grammar, and we also describe the semantic restrictions in terms of these sets, functions and relations.

In the third chapter we explain how the Simple Multi-Pass evaluation strategy can be applied to attributed abstract program trees.

In the fourth chapter we describe the implementation of a program that generates an attribute evaluator for a given attributed abstract program tree grammar.

The complete documentation of the generator and its source in VAX-Pascal is described in [Faase89], which can be viewed as a collection of appendices to this paper.

Chapter 1

The syntax of
attributed abstract program tree grammars

1.1 Introduction

In this chapter we introduce step by step the concept of Attributed Abstract Program Trees (AAPTs). Each section deals with one aspect of their grammars. At each step we give an incomplete meta grammar defining the AAPT grammars and an example. The complete meta grammar is given in the last section of this chapter.

1.2 Tree rules and class definitions

The abstract program trees that we want to describe consist of typed nodes. The type of each node determines the number of son nodes, and restricts the types of the sons.

The easiest way to restrict the types of the nodes is to define a tree rule for each node type. The tree rule specifies the number of sons, and the subset of node types which is legal at each son. These subsets can be defined by introducing classes of node types, and together with each class, a definition of its subset of the node types.

The elements of the AAPT grammars are the node types and the classes.

One can imagine a number of extensions to this simple concept we have so far described. The first extension is to allow classes in the subsets of the class definitions, thus allowing hierarchical classes. The second possible extension is that, if the same tree rule is associated with all the elements of a class then the tree rule can be defined for the class. That is, a tree rule defined for a class is seen as defined for all the elements of the class. This continues recursively for a hierarchy of classes. A third extension is that instead of allowing only classes in the right-hand side of a tree rule, node types could also be allowed.

The way in which node types, classes, tree rules and class definitions are specified is shown in the following grammar:

element    : class ; node type .

rule       : ( class rule ; tree rule ) , . .

class rule : class , = , { , element LIST , } .

tree rule  : element , => , (part name , : , element)LIST .

class      : identifier .

node type  : identifier .

part name  : identifier .

This meta grammar defines the syntax of the class definitions and the tree rules. An AAPT grammar consists of an enumeration of the classes and the node types (for the sake of clarity), the tree rules, the class definitions and the definition of the root element. This is expressed by the following rule of the meta grammar:

AAPT grammar
         : ( CLASSES , class LIST , . )OPTION
         ,  NODE TYPES , node type LIST , .
         ,  RULES , rule SEQ
         ,  ROOT , element
         .

The following example defines a tree language in which logical operators, boolean variables and constants, and if-statements and while-statements can be applied. Notice the definition of bool_operator.

CLASSES  stat, bool_decl, bool_expression, bool_operator,
         bool_const.

NODE TYPES  program, seq_bool_decl, empty_bool_ decl,
            assignment, if_stat, while_stat, seq_stat,
            bool_ident, not_operator, true, false,
            and, or, nand, nor, exor, equiv.

RULES

program        =>  decl      : bool_decl ,
                   body      : stat .

bool_decl      =  { seq_bool_decl , empty_bool_decl }.

seq_bool_decl  =>  first     : bool_ident ,
                   rest      : bool_decl .

stat           =  { assignment , if_stat , while_stat
                  , seq_stat }.

assignment     =>  dest      : bool_ident ,
                   source    : bool_expression .

if_stat        =>  condition : bool_expression ,
                   then_part : stat ,
                   else_part : stat .

while_stat     =>  condition : bool_expression ,
                   do_clause : stat .

seq_stat       =>  left      : stat ,
                   right     : stat .

bool_expression = { bool_const , bool_operator
                  , not_operator , bool_ident }.

bool_const     =  { true , false }.

bool_operator  =>  left      : bool_expression ,
                   right     : bool_expression .

bool_operator  =  { and , or , nand , nor
                  , exor , equiv }.

not_operator   =>  operand   : bool_expression .

ROOT program

1.3 Attributes

We now add attributes to the abstract program tree grammars that we have defined so far. The attributes are attached to the nodes of the trees, just as they are associated with the non-terminals in classical attribute grammars. It is possible to associate attributes with classes, indicating that all node types in the class (recursively for all class hierarchies) have the attributes associated with the class.

Before applying attributes in an abstract program tree grammar we enumerate their specifications. Each attribute has a type definition, an indication whether it is synthesized or inherited, and a set of all node types and classes with which it is associated.

The way in which attributes are specified is described in the following part of the meta grammar:

attribute definition
         : attribute name , : , type identifier
         , (synthesized symbol ; inherited symbol)
         , OF , element LIST , .
         .

attribute name : identifier .

synthesized symbol : SYNTHESIZED ; SYN .

inherited symbol   : INHERITED ; INH .

type identifier
         : INTEGER ; REAL ; BOOLEAN ; CHAR
         ; user defined type identifier
         .

user defined type identifier
         :  ...(will be explained)...
         .

Now the meta grammar rule for an AAPT grammar becomes:

AAPT grammar
         : ( CLASSES , class LIST , . )OPTION
         ,  NODE TYPES , node type LIST , .
         ,  ATTRIBUTES , attribute definitions SEQ
         ,  RULES , rule SEQ
         ,  ROOT , element
         .

We can extend the example of section 1.2 with a number of attribute definitions in the following way:

CLASSES     ...(as above)...

NODE TYPES  ...(as above)...

ATTRIBUTES

ident_nr    : INTEGER  SYNTHESIZED  OF bool_ident .

const_prop  : BOOLEAN  SYNTHESIZED
              OF bool_operator, not_operator .

RULES       ...(as above)...

ROOT  program

1.4 Input and output attributes

We need to specify which attributes are initially available in the abstract program tree, and which attributes have to stay in the tree after all attributes have been evaluated. These two kinds of attributes can be specified by means of input and output rules. In the AAPT grammars we make use of interface rules to specify which attributes are input or output attributes. The meta grammar rule for the interface rule is:

interface rule
         : attribute name
         , ( AT , element LIST)OPTION , .
         .

The meta grammar rule for AAPT grammars now becomes:

AAPT grammar
         : ( CLASSES , class LIST , . )OPTION
         ,  NODE TYPES , node type LIST , .
         ,  ATTRIBUTES , attribute definitions SEQ
         , (  INPUT , interface rule SEQ)OPTION
         , ( OUTPUT , interface rule SEQ)OPTION
         ,  RULES , rule SEQ
         ,  ROOT , element
         .

The example could be extended with:

INPUT ident_nr .

1.5 Types

In the above examples the attributes were defined by using standard types. We also allow the use of user defined type identifiers. The user-defined type identifiers are defined by the following rule of the meta grammar:

user defined type identifier : identifier .

After the enumeration of the node types and before the enumeration of the attributes we require an enumeration of the user defined types according to the following extension of the meta grammar rule for AAPT grammars:

         , ( TYPES
           , user defined type identifier LIST , .
           )OPTION

An example of a type definition is:

TYPES newtype , othertype .

1.6 Semantic functions

We need to construct expressions composed of attributes and semantic functions. We require the expressions to be strongly-typed, and therefore we also require the functions used in the expressions to be typed. This typing is specified in the definition of the semantic functions. The form of the semantic function definitions is defined by the following rule of the meta grammar:

semantic function definition
         : semantic function name
         , type identifier LIST PACK
         , : , type identifier , . 
         .

semantic function name : identifier .

After the enumeration of the user defined types and before the enumeration of the attributes we require an enumeration of the semantic functions according to the following extension of the meta grammar rule for AAPT grammars:

         , ( FUNCTIONS , semantic function definition SEQ
           )OPTION

We extend the example of section 1.2 with the following semantic function definitions:

FUNCTIONS
  andf   ( boolean , boolean ) : boolean.
  truef  () : boolean.
  falsef () : boolean.

1.7 Attribute assignments

Using the previous definitions we are now able to define evaluation rules for attributes. Just as in classical attribute grammars the evaluation rules are associated with the tree rules. We extend the tree rules with attribute assignments. The meta grammar rule defining the tree rules is changed into:

tree rule 
         : element , => , (part name , : , element)LIST
         , attribute assignments OPTION
         .

We recall that the elements in the tree rule definition could be node types or classes. If we associate an attribute with a class, then the attribute is associated with all the node types of the class (according to the class hierarchy). It is possible to associate an attribute with just a certain element of a class. Such an attribute is not always present if the class occurs in a tree rule. It is present only in the case that a certain element is applied at that position in the tree. In the case that such a class is used in a tree rule, selection constructions are needed both for getting values from and assigning values to attributes. In subsection 1.7.1 we introduce case expressions and in subsection 1.7.2 we define selective assignments. The meta grammar rule for attribute assignments is:

attribute assignments
         : [ , attribute assignment LIST , ] .

attribute assignment
         : attribute occurrence , = , expression 
         ; selective assignment
         .

1.7.1 Expressions

In its most simple form an expression is an applied occurrence of an attribute. Such attribute occurrences consist of two parts: an attribute name, and a part name indicating the element of the tree rule it belongs to. The left-hand side element (the element for which the tree rule is defined) is represented by the symbol "#". The meta grammar rule defining attribute occurrences is:

attribute occurrence : attribute name , OF , node name .

node name : part name ; # .

Semantic functions can be used to form more complex expressions, as shown in the meta grammar rule:

semantic function application
         : semantic function name
         , expression LIST PACK
         .

Finally, we introduce case expressions, with which it will be possible to select attributes that are associated with a certain element of a class. We first need to indicate to which element of the tree rule the selection is applied. We do this by using the part name. Next, we must give a number of alternatives which consist of subsets of the elements in the class under consideration. Of course, these subsets have to form a complete partition of the elements in the class. This is necessary to make the case expression consistent, in the sense that it will always return a value. The last alternative can also be abbreviated with the symbol "OTHERS". The meta grammar rule for case expressions is:

case expression
         : CASE , part name , OF
         , (selector , : , expression )CHAIN ;
         , ( ; , OTHERS , : , expression  )OPTION
         , ESAC
         .

selector : element LIST .

Only the meta grammar rule for expressions is still required. It combines the previous three forms:

expression
         : attribute occurrence
         ; semantic function application
         ; case expression
         .

From this it follows that the three forms can be used recursively. For example, it is possible to have a case expression as an argument of a semantic function.

1.7.2 Selective assignments

Selective assignments resemble case expressions. The difference is that there is a restriction on part names used within an alternative of a selective assignment. The part names on which this restriction is imposed are the part names on which a selection is performed by a selective assignment, and that are used in the left-hand side of a simple attribute assignment. All these part names in all the alternatives of the selective assignment have to be equal to the part name that is used in the selection of the selective assignment. Furthermore, it is not necessary (as it was with case expressions) that each element of the class can be found in one of the alternatives of the selective assignment. In general, this will be impossible, because selection constructions have only been introduced for attributes that are not associated with all the elements of a class. The meta grammar rule for selective assignment is:

selective assignment
         : CASE , part name , OF
         , ( selector , : 
           , attribute assignment LIST)CHAIN ;
         , ( ; , OTHERS , : , attribute assignment LIST
           )OPTION
         , ESAC
         .

We can now extend the example with the semantic rules for the attribute const prop. This attribute indicates that a certain expression will always result in a constant value. The extended example is:

CLASSES     ...(as above)...

NODE TYPES  ...(as above)...

FUNCTIONS   ...(as above)...

ATTRIBUTES

ident_nr    : INTEGER  SYNTHESIZED  OF bool_ident .

const_prop  : BOOLEAN  SYNTHESIZED
            OF bool_operator, not_operator .

RULES       ...(as above, except for :)

bool_operator  =>  left      : bool_expression ,
                   right     : bool_expression
[ const_prop OF #
  = andf( CASE left OF
            bool_const    : truef();
            bool_operator ,
            not_operator  : const_prop OF left;
            bool_ident    : falsef()
          ESAC
        , CASE right OF
            bool_const    : truef();
            bool_operator ,
            not_operator  : const_prop OF right;
            bool_ident    : falsef()
          ESAC
        )
].

not_operator   =>  operand   : bool_expression
[ const_prop OF #
  = CASE operand OF
      bool_const    : truef();
      bool_operator ,
      not_operator  : const_prop OF operand
      bool_ident    : falsef()
    ESAC
].

ROOT  program

In the example above we did not use selective assignments. We also mention that in this example nothing is done with the value of the attribute const prop.

1.7.3 Remarks on the selection constructions

In this subsection we discuss the differences between the selective assignment and the case expression. As we said before, these selection constructions have been introduced to make attributes associated with elements in a class visible. A side effect of this is that it is possible to make an assignment whose value depends on which element of a certain class is used. This is a second purpose for which selection constructions can be used. However, we restrict this use to the expressions in the attribute assignments only. For this reason we have put the restriction on the part names used within the alternatives of the selective assignment, as described in the previous section. That is, these part names have to be equal to the part name that is used in the selection of a selective assignment. Later on we shall see that it is not allowed to assign an attribute associated with a class, if within a selective assignment an element of that class has been selected.

These restrictions have no influence on the formal power of the selection constructions. All assignments that do not fit these restrictions can be rewritten into legal attribute assignments. This is possible because these restrictions do not apply to case expressions.

1.8 The complete syntax of aapt grammars

We can now define the complete meta grammar that describes AAPT grammars. The only thing we need to add is the possibility of options that influence the behaviour of the compiler that reads the AAPT grammar. These options are not as yet specified. Making use of all the previous definitions, the complete meta grammar is:

AAPT grammar
         : ( OPTIONS , option LIST , . )OPTION
         : ( CLASSES , class LIST , . )OPTION
         ,  NODE TYPES , node type LIST , .
         , ( TYPES
           , user defined type identifier LIST , .
           )OPTION
         , ( FUNCTIONS , semantic function definition SEQ
           )OPTION
         ,  ATTRIBUTES , attribute definitions SEQ
         , (  INPUT , interface rule SEQ)OPTION
         , ( OUTPUT , interface rule SEQ)OPTION
         ,  RULES , rule SEQ
         ,  ROOT , element
         .

option  :  (determined by implementation) .

class      : identifier .

node type  : identifier .

element    : class ; node type .

user defined type identifier : identifier .

type identifier
         : INTEGER ; REAL ; BOOLEAN ; CHAR
         ; user defined type identifier
         .

semantic function definition
         : semantic function name
         , type identifier LIST PACK
         , : , type identifier , . 
         .

semantic function name : identifier .

attribute definition
         : attribute name , : , type identifier
         , (synthesized symbol ; inherited symbol)
         , OF , element LIST , .
         .

attribute name : identifier .

synthesized symbol : SYNTHESIZED ; SYN .

inherited symbol   : INHERITED ; INH .

interface rule
         : attribute name
         , ( AT , element LIST)OPTION , .
         .

rule       : ( class rule ; tree rule ) , . .

class rule : class , = , { , element LIST , } .

tree rule 
         : element , => , (part name , : , element)LIST
         , attribute assignments OPTION
         .

part name  : identifier .

attribute assignments
         : [ , attribute assignment LIST , ] .

attribute assignment
         : attribute occurrence , = , expression 
         ; selective assignment
         .

selective assignment
         : CASE , part name , OF
         , ( selector , : 
           , attribute assignment LIST)CHAIN ;
         , ( ; , OTHERS , : , attribute assignment LIST
           )OPTION
         , ESAC
         .

selector : element LIST .

expression
         : attribute occurrence
         ; semantic function application
         ; case expression
         .

attribute occurrence : attribute name , OF , node name .

node name : part name ; # .

semantic function application
         : semantic function name
         , expression LIST PACK
         .

case expression
         : CASE , part name , OF
         , (selector , : , expression )CHAIN ;
         , ( ; , OTHERS , : , expression  )OPTION
         , ESAC
         .

Chapter 2

Restrictions on aapt grammars

In this chapter we present a formal description of the semantical restrictions that are imposed on the AAPT grammars. We do this by constructing certain sets from parts of the AAPT grammar under consideration, and putting restrictions on these sets. From these sets relations can be derived to further specify the semantics.

2.1 Classes and nodes

The first part of the AAPT grammar contains the enumeration of the classes and node types. This part is given by:

CLASSES  C₁ , ... , C_n  .
NODE TYPES  N₁ , ... , N_m  .

From this we define the two sets classes and nodes:

df  classes = {C₁,..,C_n}
df  nodes   = {N₁,..,N_m}

These sets must be disjoint. An identifier cannot be used for both a class and a node type. This is expressed by:

classes  disjoint  nodes

We join the classes and the node types in the set elements:

df elements = classes union nodes

2.2 Type definitions

The next part of the AAPT grammar describes the user-defined types. We define the standard type identifiers using:

df  standard type identifiers
    = {INTEGER, REAL, BOOLEAN, CHAR}

The enumeration of the types in the AAPT grammar is given by:

TYPES  T₁ , ... , T_n  .

The set user defined type identifiers is defined by:

df  user defined type identifiers = {T₁,..,T_n}

Because the standard type identifiers are predefined, we are not allowed to redefine them. This leads to the restriction:

    user defined type identifier
    disjoint  standard type identifiers

Finally, the set type identifiers is defined by:

df  type identifiers
    =   standard type identifiers
      union user defined type identifiers

2.3 Semantic function definitions

The definition of the semantic functions forms the next part of the AAPT grammar. The enumeration of the function definitions is given by:

FUNCTIONS  funcdef₁ .  ....  funcdef_n .

The set semantic function definitions is defined by:

df  semantic function definitions
    = {funcdef₁,..,funcdef_n}

Each semantic function definition expresses the typing of this function. The type identifiers that are used must be defined. This results in the following restriction:

for all  "F (T₁,..,T_n):T" in semantic function definitions
  ( {T₁, .. ,T_n,T} subset type identifiers)

We define the set semantic functions as the set of all semantic function identifiers:

df  semantic functions
    = {  F
      |  "F (T₁,..,T_n):T" in semantic function definitions
      }

The semantic function identifiers should have a unique definition, which is expressed by:

for all  F  in  semantic functions
 ( exists!  "F (T₁,..,T_n):T" in semantic function definitions)

If this restriction is satisfied for each semantic function, then the functions type of, number of args of and type arg can be defined making use of the given set of function definitions. These functions allow us to reason on a higher abstraction level. The functions are defined by:

for all  "F (T₁,..,T_n):T" in semantic function definitions
  df  type of(F) = T
  df  number of args of(F) = n
  for all  i in {1,..,n}
    df  type arg(i,F) = T_i

2.4 Attribute definitions

The next part of the AAPT grammar contains all the attribute definitions. These are represented by:

ATTRIBUTES  attrdef₁ .  ...  attrdef_n .

The set attribute definitions is defined by:

df  attribute definitions  =  {attrdef₁,..,attrdef_n}

The definition of an attribute consists of three parts: a type identifier, an indication whether it is a synthesized or an inherited attribute and the set of elements with which this attribute is associated. The restrictions of these parts are described by:

for all  "A : T S/I OF E₁ .. E_n" in attribute definitions
  (  T in type identifiers
   and S/I in {"synthesized","inherited"}
   and {E₁,..,E_n} subset elements
  )

The set attributes is defined as all attribute identifiers:

df  attributes
    = {  A
      |  "A : T S/I OF E₁ .. E_n" in attribute definitions
      }

The restriction that attributes have to be defined consistently is:

for all  A  in  attributes
  ( exists!  "A : T S/I OF E₁ .. E_n" in attribute definitions)

When this restriction is satisfied we can derive a number of abstractions. The function type of can also be defined for attributes:

for all  "A : T S/I OF E₁ .. E_n" in attribute definitions
  df  type of(A) = T

We also need to define the set of synthesized and inherited attributes. These are defined by:

df  synthesized attributes
    = {  A
      |  "A : T synthesized .." in attribute definitions
      }

df  inherited attributes
    = {  A
      |  "A : T inherited .." in attribute definitions
      }

Finally, we define the function attr of that returns the set of attributes that have been associated with each element:

for all  E  in  elements
  df  attr of(E)
      = {  A
        |  "A : T S/I OF .. E .." in attribute definitions
        }

2.5 Input and output definitions

The attributes that are initially available in the tree before the other attributes are evaluated are called input attributes. Attributes that have to remain in the tree after the evaluation process is finished are called output attributes. Because the restrictions on the two kinds of attributes are alike, we deal with them together. In the AAPT grammar the input and the output attribute definitions are given by:

INPUT   inputdef₁ .  ...   inputdef_n .
OUTPUT outputdef₁ .  ...  outputdef_m .

We can now define three sets:

df  input interface rules
    = {inputdef₁,..,inputdef_n}
df  output interface rules
    = {outputdef₁,..,outputdef_n}
df  interface rules
    = input interface rules union output interface rules

We impose the following restrictions on the set interface rules:

for all  "A .." in interface rules
  (A in attributes)

for all  "A AT E₁ .. E_n" in interface rules
  (for all  1<=i<=n
    ( E_i in elements  cand  A in attr of(E_i)
    )
  )

The sets input attributes and output attributes are defined by:

df  input attributes 
    = {  A in attributes
      |  "A" in input interface rules
       or "A AT .." in input interface rules
      }

df  output attributes 
    = {  A in attributes
      |  "A" in output interface rules
       or "A AT .." in output interface rules
      }

The interface rules must be unambiguous. This results in the restrictions:

for all  A in input attributes
  (  exists!  "A" in input interface rules
   exor exists!  "A AT .." in input interface rules
  )

for all  A in output attributes
  (  exists!  "A" in output interface rules
   exor exists!  "A AT .." in output interface rules
  )

(In this description exor represents the exclusive-or boolean operator.)

We define functions that express what the input, output and interface attributes of each element are:

for all  E in elements
  df  input attr of(E)
      = {  A in attr of(E)
        |  "A" in input interface rules
         or "A AT .. E .." in input interface rules
        }
  df  output attr of(E)
      = {  A in attr of(E)
        |  "A" in output interface rules
         or "A AT .. E .." in output interface rules
        }
  df  interface attr of(E)
      = input attr of(E) or  output attr of(E)

2.6 Rules

The rules that describe the AAPT grammar form the largest part of the input. These rules can be divided into the tree rules and the class definitions. The tree rules and the class definitions may be written in any order in the AAPT grammar. Each semantic assignment is written after its associated tree rule, but for the moment we will neglect them. The rules are given by:

RULES  rule₁ .  ...  rule_n .

The following sets can now be constructed:

df  rules = {rule₁,..,rule_n}
df  class definitions
    = {  "C = {E₁,..,E_n}" in rules  }
df  tree rules
    = {  "E => P₁:E₁,..,P_n:E_n"
      |  "E => P₁:E₁,..,P_n:E_n attrass " in rules
      }

In the following subsections we deal with tree rules and class definitions separately.

2.6.1 Class definitions

Class definitions are only allowed for classes. The set of elements for each class has to be a subset of the set element. This can be described as follows:

for all  "C = {E₁,..,E_n}" in class rules
  (  C in classes  and  {E₁,..,E_n} subset elements)

For each class there has to be precisely one class definition. This leads to the restriction:

for all  C in classes
  ( exists! "C = {E₁,..,E_n}" in class rules)

If this restriction is satisfied, then we can define the function class that returns the set of elements in the class:

for all  "C = {E₁,..,E_n}" in class rules
  df  class(C) = {E₁,..,E_n}

We can also define the subset of the elements that are used as elements in a class definition:

df  all elements in class
    = {  E  |  E in class(C)  and  C in classes  }

We can now define a function clos class that represents the transitive closure on the element-of-class relation. The function returns all the elements that can be found directly or indirectly in the class, according to the class hierarchy. The function is defined by:

for all  C in classes
  df  clos class(C)
      = class(C)
        union  union{  clos class(C')
                |  C' in (class(C) intersection classes)
                }

Using this function we can define the restriction that recursive class definitions are not allowed, that is:

for all  C in classes  (C not in clos class(C))

We can also define the inverse relation and its closure, in the following way:

for all  E in elements
  df  parents of(E) = {  C in classes  |  E in class(C)  }

for all  E in elements
  df  clos parents of(E)
      =  {  C in classes  |  E in clos class(C) }

Another function that can be derived from the above-mentioned function is the function that represents the reflexive transitive closure of the parents-of relation. This function is defined by:

for all  E in elements
  df  clos parents of and(E) = clos parents of(E) union {E}

2.6.2 Tree rules

The part names for each tree rule have to be unique within the tree rule, and its right-hand side should only contain elements. These restrictions are described by:

for all  "E => P₁:E₁,..,P_n:E_n" in tree rules
  (  E in elements
   and {E₁,..,E_n} subset elements)
   and for all 1<=i<j<=n (P_i != P_j)
  )

We define the set of elements for which a tree rule is given:

df  elements with tree rule
    = {  E in elements
      |  "E => P₁:E₁,..,P_n:E_n" in tree rules
      }

For each class and node type there cannot be more than one tree rule in the set of tree rules. This is expressed by:

for all  E  in  elements with tree rule
  (exists!  "E => P₁:E₁,..,P_n:E_n" in tree rules )

If these restrictions have been satisfied, then we can again define an abstraction function. The function tree returns for each element for which a tree rule is given, an ordered set of pairs. Each pair consists of a part name and its associated element. This function is defined by:

for all  E in elements
  df  tree(E)
      =  if    exists  "E => P₁:E₁,..,P_n:E_n" in tree rules
         then  ((P₁,E₁),..,(P_n,E_n))
         else  empty
         fi

In the same way we can define the function attr assignments of that returns the attribute assignments associated with each element:

for all  E in elements
  df  attr assignments of(E)
      =  if    exists  "E => P₁:E₁,..,P_n:E_n attrass" in rules
         then  "attrass"
         else  ""
         fi

2.6.3 Consistent definition of tree productions

A tree production is the tree rule that is applied at an element making use of the class definitions and the restriction that a tree rule associated with a class is inherited by all the elements in the class. We first define the set of elements that have an indirectly associated tree rule, making use of the class definitions:

df  elements with indirect tree rule
    = union{  clos class(E)
           |  E in elements with tree rule
           }

With this set we define the set of all elements that have a tree rule:

df  elements with tree production
    =    elements with tree rule
      union  elements with indirect tree rule

Each of the elements can have only one tree production defined . This restriction is described by:

for all  E in elements with tree production
  (exists!  E' in clos parents of and(E)
     (E' in elements with tree rule
     )
  )

If this restriction is satisfied, then we can define the function tree production of that returns the unique tree rule for each element that has a tree production:

for all  E in elements
  df  tree production of(E)
      = if    E in elements with tree production
        then  tree(E')
              where  {E'} = clos parents of and(E)
                           intersection elements with tree rule
              endwhere
        else  empty
        fi

2.7 Root element

The root element of the grammar is given in the last part of the AAPT grammar by:

ROOT  E

We define the value of root by:

df  root = E

The only restriction on the root is that it should be an element:

root in elements

2.8 Well-formed grammars

A context-free grammar is called well-formed if each nonterminal can occur in a sentential form derived from the start symbol, and this sentential form derives at least one finite string of terminals. An abstract program tree grammar is called well-formed if the reachability property and the termination property are satisfied. These properties are described in the following subsections.

2.8.1 The reachability property

The reachability property is satisfied if for each node type there exists a tree in which a node with this node type occurs. If this is the case, then the node type is reachable from the root element. The concrete reachability property states that all node types have to be reachable from the root element. A more general complete reachability property states that all elements should be reachable from the root element.

We start with the function tree parents of, that can be viewed as the inverse of the tree function:

for all  E in elements
  df  tree parents of(E)
      = {  E' in elements
        |  (P,E) in tree production of(E')
        }

We now explain the reason why we use the function tree production instead of the function tree. Let there be a class C with the class definition "C = {E,..}" and a tree rule "C => p1:E1,..,pn:En.". The element E1 is now reachable from the element E through the class C. If we had used the function tree this would not have been the case.

Using the last definition, we can define the function elements that reach to, which returns all the elements that can reach a certain element through a tree rule and a number of class definitions:

for all  E in elements
 df  elements that reach to(E)
      = union{  tree parents of(E')
             |  E' in   clos parents of and(E)
             }

Finally, we define the function reachable from root, which makes it possible to test whether a certain element can be reached from the root element. The first argument of this function is the element under consideration. The second argument represents the set of elements that have been tried to reach the root. If for the second time we reach an element that we reached before, then it is a road that need not be tried again. The definition of the function is:

df  reachable from root(E,V)
    = if    E in V
      then  false
      else      root in clos parents of and(E)
            cor exists  E' in elements that reach to(E)
                   (reachable from root(E',V union {E}))
      fi

With this function it is possible to formulate the two reachability properties, in the following way:

df  concrete reachability
    = for all  N in nodes
         (reach from root(N,empty))
df  complete reachability
    = for all  E in elements
         (reach from root(E,empty))

2.8.2 The termination property

The termination property says that for each element it will always be possible to make at least one finite tree that starts with this element at the root. The formal description of the termination property is:

for all  E in elements
  (termination(E,empty))
  where  termination(E,V)
          = if    E in V
            then  false
            elif  E in elements with tree production
            then  for all  (P,E') in tree production of(E)
                     (termination(E',V union {E}))
            elif  E in classes
            then  exists  E' in class(E)
                     (termination(E',V union {E}))
            else  true
            fi

2.9 Flat implementability

The implementation technique we have used to make an evaluator is called 'flat', because it tries to stay as close as possible to the original description of the AAPT grammar. This, however, puts restrictions on the tree rules and class definitions as a whole.

We start with the definition of the set right hand side elements, which consists of the elements on the right-hand side of all tree rules together with the root:

df  right hand side elements
    =   {root}
      union {  E
        |  (P,E) in tree(E')
         and E' in elements with tree rule
        }

We can also define left hand side elements as the set of all elements with a tree rule (the branches) together with all the node types that do not have a direct or an indirect tree rule, and thus are considered to have an empty tree rule (the leaves). This set is defined by:

df  left hand side elements
    =   elements with tree rule
      union (nodes without elements with tree production)

The extra restriction imposed by the implementation is that, if we start with a right-hand side element then we will never first meet an element with an indirect tree rule when we go down in the class hierarchy. This undesirable situation occurs if a class exists with a tree rule, from which one of its elements is in the right-hand side of another tree rule. This restriction is not a strong restriction. Recognize that the given example is a very non-orthogonal construction, in the sense that a class is constructed in which one of the elements is used in another context. Of course, it is possible to rewrite the AAPT grammar in such a way so that the problem can be avoided. We describe this restriction by:

for all  E in right hand side elements
   (end always at(E))
   where  end always at(E)
          = (  E in left hand side elements
             or (  E not in elements with indirect tree rule
                and for all  E' in class(E)
                     (end always at(E'))
               )
            )
   endwhere

2.10 Assignment of attributes using class rules

In this section we describe a number of restrictions on the attributes, taking into account the class definitions. First of all we define the function all attr of, that indicates which attributes are associated with a given element when the class definitions are observed:

for all  E in elements
  df  all attr of(E)
      = union{  attr of(E')
             |  E' in clos parents of and(E)
             }

An attribute is not allowed to be associated with both a class and an element in the class hierarchy of that class:

for all  E in elements, C in parents of(E)
  (attr of(E)  disjoint  all attr of(C))

If an element belongs to more than one class then the sets of attributes from those classes should be the same. This can be formulated by:

for all  E in elements, C₁,C₂ in parents of(E)
  (all attr of(C₁) = all attr of(C₂))

We also have to impose restrictions on the input and output attributes. For this we define the functions all input attr of, all output attr of and all interface attr of, that return the set of input, output and interface attributes, respectively, that are associated with an element using the class definitions:

for all  E in elements
  df  all input attr of(E)
      = union{  input attr of(E')
             |  E' in clos parents of and(E)
             }

  df  all output attr of(E)
      = union{  output attr of(E')
             |  E' in clos parents of and(E)
             }
  df  all interface attr of(E)
      = all input attr of(E) union all output attr of(E)

Using these definitions we define a restriction that is very similar to the previous restriction. This restriction states that if an element belongs to two classes then the input/output properties of the attributes of these classes should be the same:

for all  E in elements, C₁,C₂ in parents of(E)
  (   all input attr of(C₁) =  all input attr of(C₂)
   and all output attr of(C₁) = all output attr of(C₂)
  )

2.11 Restrictions on the semantic rules

In the AAPT grammar the semantic rules are specified in the form of attribute assignments associated with the tree rules. The set of attribute assignments associated with a tree rule may be empty, but in most cases there will be some attribute assignments associated with each tree rule. The function attr assignment of has been defined as a function that returns the (possibly empty) set of attributes for each element.

Using this set we can define the set of elements that have attribute assignments in the following way:

df  elements with attr assignments
    = {  E in elements
      |  attr assignments of(E)  != ""
      }

Because attribute assignments are associated with tree rules, the following is true:

elements with attr assignments subset elements with tree rule

2.11.1 Test functions

In this subsection we define a number of test functions that will be applied to both assignments and expressions. We start with the selectors, which enumerate the elements of an alternative in a selective assignment or a case expression. The function elements of selector returns the elements that are enumerated in a selector. The function test selectors checks whether a given set of selectors for a class is correct. This means that the elements used in the selectors should belong to the class and that the different selectors should be disjoint. The function takes into account the possibility that the last selector equals OTHERS, which represents the set of remaining elements. These two functions are defined in the following way:

df  elements of selector("E₁,..,E_n")  =  {E₁,..,E_n}

df  test selectors((SEL₁,..,SEL_n),C)
    = for all  1<=i<=m
        (  elements of selector(SEL_i) subset class(C)
         and for all  1<=j<i
             (           elements of selector(SEL_j) 
               disjoint  elements of selector(SEL_i)
             )
        )
      where  m = if SEL_n = "OTHERS" then n-1 else n fi
      endwhere

We define the function selector that returns the set of elements represented by a selector, taking into account that the last selector may be equal OTHERS. This function is defined by:

df  selector(i,(SEL₁,..,SEL_n),C)
    = if    i = n  and  SEL_n = "OTHERS"
      then  class(C)
            without  {  E
                     |  E in element of selector(SEL_j)
                      and 1<=j<=n-1
                     }
      else  elements of selector(SEL_i)
      fi

We define two functions that test whether a given attribute and part name are an assigned or an applied attribute occurrence:

df  assigned attributes(A,P)
    = if    P = #
      then  A in synthesized attributes
      else  A in inherited attributes
      fi

df  applied attributes(A,P)
    = if    P = #
      then  A in inherited attributes
      else  A in synthesized attributes
      fi

2.11.2 Visibility of attributes

In this subsection we make some remarks concerning the visibility of attributes. We make use of the fact that for different elements of a class, different sets of attributes may be visible. Because of this, we introduce selective assignments and case expressions. We have to check that these selection constructions are used correctly. If an attribute is associated with an element of a class we then have to use a selection construction to address this attribute.

To perform this check we introduce the concept of a part list. With each component of an attribute assignment a part list is associated. The part list associated with a component of an attribute assignment defines for each part name which element is associated with it. On the highest level of the attribute assignments this list equals the tree rule extended with the left-hand side element that is associated with the main part name. Within each selection construction the part list is updated such that the elements of the selection are associated with the part name involved in the selection. The part list can be used to determine whether the used attributes and defined attribute of an assignment are visible.

We define a function that can update a part list with a new element for a part name:

df  update(partlist,partname,new)
    = concat{  (P,if P = partname then new else E fi)
            |  (P,E) in partlist
            }

2.11.3 Restrictions on the expressions

We now describe the restrictions that are put on the expressions within the assignments. For this purpose we use a function that checks a given expression against a given part list and a type. The following aspects are checked by this function:

The type correctness. Each expression has a certain type which is determined by the attributes and semantic functions within the expression.
Whether the part names that are used are in the part list.
The visibility of the attributes that are used with a certain part name. Only the attributes that are associated with the current element of a part name and with all its parents in the class hierarchy are visible.
Whether only applied attributes are used within the expression.
The correct usage of the functions. The functions that are used in the expression must be defined and have to be applied according to their definition, which means that they should contain the correct number of arguments, and these arguments should be of the correct type.
Whether the case expressions are applied to classes only.
Whether the selectors are correct as described in the previous subsection.
Whether all the elements of a class are used in the selectors of a case expression. Expressions must always return a value. When an element is not used in one of the alternatives, the value of the expression is undefined for this element.

We now present the test function. The numbers between the curly braces refer to the above restrictions:

df  test EXPR(EXPR,partlist,TYPE)
    = case  EXPR  of
      "A OF P"
       :       (P,E) in partlist    {2}
         cand  A in all attr of(E)    {3}
         cand  applied attribute(A,P)    {4}
            and  type of(A) = TYPE    {1}
      "F (EXPR₁,..,EXPR_n)"
       :       F in semantic functions    {5}
         cand  type of(F) = TYPE    {1}
            and  number of args of(F) = n    {5}
         cand  for all  1<=i<=n
                 (test EXPR(EXPR_i,partlist
                           ,type arg(F,n)));
      "CASE P OF SEL₁:EXPR₁;..;SEL_n:EXPR_n ESAC"
       :       (P,E) in partlist    {2}
            and  E in classes    {6}
         cand  test selectors({SEL₁,..,SEL_n),E)    {7}
         cand    union{  selector(i,{SEL₁,..,SEL_n),E)
                      |  1<=i<=n }
               = class(E)    {8}
         cand  for all  1<=i<=n,
                  E' in selector(i,{SEL₁,..,SEL_n),E)
                 (test EXPR(EXPR_i
                           ,update(partlist,P,E'),TYPE)
                 )
      esac

2.11.4 Restrictions on the attribute assignments

To describe the restrictions on the assignments we must use two similar descriptions, because within a selective assignment other rules are applied than at the highest level.

We begin by defining two functions that return those attributes that need to be defined. (Remember that the input attributes need not be defined.)

for all  E in elements
  df  non input attr of(E)
      = attr of(E) without input attr of(E)
  df  all non input attr of(E)
      = all attr of(E) without all input attr of(E)

In the assignments, the following aspects need to be checked:

Whether the part names are correctly used. This check can be divided into the following checks:

Whether on the top level the part names that are used are part names of the tree rule with which these assignments are associated.
Whether within the selective assignments the same part name is used as the part name of the selective assignment on the top level.

The visibility of the attributes. This check can be divided into the following checks:

On the top level those attributes are visible that are associated with the element of the part name or with its parents in the class hierarchy.
Within the selective assignments only those attributes are visible that are associated with the current element of the part name in the part list.

Whether assignments are only made to assigned attributes.
Whether the selective assignments are applied to classes.
Whether the selectors are correct, as described earlier.

We first give the test function that checks all these requirements for the assignments within a selective assignment. The numbers between the curly braces refer to the above restrictions:

df  test AAS("AA₁;..;AA_n",partlist,P)
    = (for all  1<=i<=n
         (case  AA_i  of
          "A OF P' = EXPR"
           :       P' = P    {1b}
                and  (P,E) in partlist
                and  A in non input attr of(E)    {2b}
             cand  assigned attribute(A,P)    {3}
                and  test EXPR(EXPR,partlist,type of(A));
          "CASE P' OF SEL₁:AAS₁;..;SEL_m:AAS_m ESAC"
           :       P' = P    {1b}
                and  (P,E) in partlist
                and  E in classes    {4}
             cand  test selectors((SEL₁,..,SEL_m),E)    {5}
             cand  for all  1<=j<=m
                    , E' in selector(j,(SEL₁,..,SEL_m),E)
                      (test AAS(AAS_j
                               ,update(partlist,P,E'),P)
                      )
          esac
      )

The final test on the attribute assignments is given by:

for all  E  in  elements with attr assignments
   (for all  1<=i<=n
       (case  AA_i  of
        "A OF P = EXPR"
         :       (P,E) in partlist    {1a}
              and  A in all non input attr of(E)    {2a}
           cand  assigned attribute(A,P)    {3}
               and  test EXPR(EXPR,partlist,type of(A));
         "CASE P OF SEL₁:AAS₁;..;SEL_m:AAS_m ESAC"
          :       (P,E) in partlist    {1a}
               and  E in classes    {4}
            cand  test selectors((SEL₁,..,SEL_m),E)    {5}
            cand  for all  1<=j<=m
                   , E' in selector(j,(SEL₁,..,SEL_m),E)
                     (test AAS(AAS_j
                              ,update(partlist,P,E'),P)
                     )
         esac
        )
   )where  "AA₁;..;AA_n" in attr assignments of(E)
           partlist in ((#,E)) union tree(E)

2.11.5 Consistency of the semantic rules

If all the attributes associated with a tree rule that have to be assigned are unambiguously assigned then the attribute assignments are consistent.

We define the function must assigned that determines the attributes which must be assigned in a given tree rule:

df  must assigned(E)
    = union{  (P,(E₁,..,E_n),A)
           |  E' = E₁
            and for all  1<=i<n (E_i in class(E_i-1))
            and if   n = 1
              then A in all non input attr of(E_n)
              else A in non input attr of(E_n)
              fi
            and (P,E') in ((#,E)) union tree(E)
           }

Notice that in this definition each attribute is defined by a part name, a vector of elements and the attribute name. We use a vector of elements to indicate that the specific derivation using the class definitions is significant to determine the attribute. This implies that whenever an attribute, associated with a node type, can be reached from a class through more than one derivation (using the class definitions), this attribute represents more than one attribute instance, as viewed from this specific class.

We also define the function assigned in that returns all the attributes that are assigned by a set of attribute assignments (we make use of the concat operation so that double assignments to attributes are preserved):

df  assigned in("AA₁;..;AA_n",(E₁,..,E_m),partlist)
    = concat{  case  AA_i  of
               "A OF P = EXPR"
                : (P,(E₁,..,E_m,E),A)
                  where (P,E) in partlist;
               "CASE P OF SEL₁:AAS₁;..;SEL_k:AAS_k ESAC"
                : concat
                     {  assigned in(AASi,(E₁,..,E_m,E)
                                   ,update(partlist,P,E')
                                   )
                     |  1<=j<=k
                      and (P,E) in partlist
                      and E' in selector(j,(SEL₁,..,SEL_k),E)
                     }
               esac
            |  1<=i<=n
            }

We can now define the restriction in a simple way by:

for all  E in elements with attr assignments
  (for all  PEA in must assigned(E)
      (exists!  PEA in assigned in(attr assignments of(E), (),
                             ((#,E)) union tree(E)
                            )
      )
  )

Chapter 3

Dependencies for the simple multi-pass method

The algorithm that tests the Simple Multi-Pass (SMP) property and calculates the partition of the attributes over the passes needs a dependency graph. This graph is constructed from the attribute assignments. We first describe how the dependency graph is constructed, and then we present the SMP-algorithm.

3.1 Construction of the dependency graph

The dependency graph consists of vertices with labelled directed edges. We first construct the sets of vertices from the attribute definitions and then we construct the labelled edges from the attribute assignments. The labelling will be represented using two subsets of the directed edges.

3.1.1 Sets of points

The sets of vertices are defined as a specific subset of all (attribute,element) pairs. We have chosen the subset of pairs, which is the union of two subsets. The first subset contains (attribute,element) pairs, where the element is a left-hand side element and the attributes above the element are projected onto it according to the class hierarchy. The second subset contains (attribute,element) pairs, where the element has an indirect tree rule and the attributes are associated with this element. This subset is defined by:

df  all attr elem pairs
    =   union{  (A,E)
             |  E in left hand side elements
              and A in all attr of(E)
             }
      union union{  (A,E')
             |  E' in elements with indirect tree rule
              and A in attr of(E')
             }

We define a function that determines on which of these pairs a given attribute occurrence in the attribute assignments is projected:

df  attr elem pairs of(A,E)
    = {  (A,E') in all attr elem pairs
      |  E' in clos parents of and(E) union clos class(E)
      }

3.1.2 Dependencies in the semantic rules

In this section we construct the set of dependencies that are found in all the attribute assignments. With this set we define the dependency graph.

We start with the definition of a function that, given two part names, returns all the pass directions in which the attribute occurrences with the second part name can be computed using values of the attribute occurrences with the first part name, according to a given part list. We use left for a left-to-right pass and right for a right-to-left pass. The definition is:

df  directions(part from,part to,partlist)
    = if    part from = #  or  part to = #
      then  {left,right}
      elif  part from = part to
      then  empty
      elif  (..,(part from,E1),..,(part to,E2),..)
            = partlist
      then  {left}
      else  {right}
      fi

We define a function dep in EXPR that returns, with a given expression and a part list, all the dependencies between the (attribute,element) pairs found in the expression and the (attribute,element) pairs from the set pairs to, where part to is the part name associated with element:

df  dep in EXPR(EXPR,partlist,pairs to,part to)
    = case  EXPR  of
      "A OF P"
       : { (AEPf,AEPt,directions(P,part to,partlist)
         |  AEPf in attr elem pairs of(A,E)
          and AEPt in pairs to
          and (P,E) in partlist
         };
      "F(EXPR₁,..,EXPR_m)"
       : union{  dep in EXPR(EXPR_j,partlist
                            ,pairs to,part to)
              |  1<=j<=m
              };
       "CASE P OF SEL₁:EXPR₁;..;SEL_m:EXPR_m ESAC"
       : union{  dep in EXPR
                   (EXPR_j,update(partlist,P,E)
                   ,pairs to,part to
                   )
              |  1<=j<=m
               and E in selector(j,(SEL₁,..,SEL_m),C)
               and (P,C) in partlist
              }
      esac

We now define the function dep in AAS which returns all the dependencies that are found in the attribute assignments with the given part list:

df  dep in AAS("AA₁;..;AA_n",partlist)
    = union{  case  AA_i  of
              "A OF P = EXPR"
               : dep in EXPR(EXPR,partlist
                            ,attr elem pairs of(A,E),P)
                 where  (P,E) in partlist
                 endwhere;
              "CASE P OF SEL₁:AAS₁;..;SEL_m:AAS_m ESAC"
               : union{  dep in AAS(AAS_j
                                    ,update(partlist,P,E)
                                    )
                      |  1<=j<=m
                       and E in selector(j,(SEL₁,..,SEL_m),C)
                       and (P,C) in partlist
                      };
              esac
           |  1<=i<=n
           }

Finally, we define the set of all dependencies that are found in all the attribute assignments:

df  dependencies
    = union{  dep in AAS(attr assignments of(E)
                        ,(#,E) concat tree(E))
           |  E in element with attr assignments
           }

3.1.3 Definition of the dependency graph

The dependency graph is represented by a four-tuple, which consists of a set of vertices, a set of directed edges and two subsets of this set of edges which indicate the edges that are labelled left or right, respectively. The definition we present here is in terms of all attr elem pairs and dependencies:

df  dependency graph
    = <A,P,L,R>
      where  A = all attr elem pairs
          ,  P
             = {  (AEPf,AEPt)
               |  (AEPf,AEPt,D) in dependencies
               }
          ,  L
             = {  (AEPf,AEPt,D) in dependencies
               |  for all  (AEPf,AEPt,D) in dependencies
                     (left in D)
               }
          ,  R
             = {  (AEPf,AEPt,D) in dependencies
               |  for all  (AEPf,AEPt,D) in dependencies
                     (right in D)
               }
       endwhere

3.2 The smp algorithm

In this section we present the SMP-algorithm. We do not prove its correctness. The algorithm presented here is optimal, in the sense that it finds the solution with the minimum possible number of passes. We also do not prove this. We present the algorithm in a number of steps.

We use the notation R+ for the transitive closure of the relation R. Because relations are represented as sets of pairs we shall also use + as a postfix operator on these kinds of sets.

The pass directions are represented by identifiers from the set {l,r,b}, representing a left-to-right pass, a right-to-left pass, and a pass that can be evaluated in both directions, respectively.

3.2.1 Formal definition of the smp problem

Before we present the algorithm we give a formal description of the SMP-problem. Given a dependency graph we have to find an optimal solution, according to which all the attributes in each possible tree can be correctly evaluated. A solution is represented by a vector of pass directions and a complete ordered partition of the (attribute,element) pairs, which indicates the number of passes, the evaluation direction of each pass, and the attributes that have to be calculated during each of these passes. A solution is optimal if no solution exists with fewer passes.

We give the definition of a function that returns all correct ordered partitions for a given vector of pass directions and a dependency graph. There are two restrictions that make a complete partition correct. Firstly, there may be no (attribute,element) pairs that depend on a pair that is evaluated in a following pass. Secondly, all the dependencies can be evaluated in the direction of this pass. If there is no solution, this function will return an empty set of solutions. The definition of this function is:

df  SMP sol(<A,P,L,R>,<d₁,..,d_n>)
    =  {  <A₁,..,A_n>
       |  {A₁,..,A_n} complete partition of A
        and for all  1<=i<j<=n, b in A_i, a in A_j
             ( (a,b) not in P+ )
        and for all  1<=i<=n, a,b in A_i
             ( (a,b) not in P/D(d_i) )
       }
       where  D(l) = L
           ,  D(r) = R
       endwhere

Using the above function we can define a function that represents the formal description of the SMP-problem. This function describes all solutions in the form of a vector of pass directions and a correct complete partition, in the following way:

df  all SMP sol(DG)
    =  {  (dvec,Apart)
       |  dvec in {l,r}n and n > 0
        and Apart in SMP sol(DG,dvec)
       }

We could also define a function that takes the best solution from this set, but we shall ignore this problem.

3.2.2 Formal definition of the smp algorithm

The minimization of the number of passes is an NP-complete problem [RäUk80,81]. The number of passes can never be larger than half the number of (attribute,element) pairs. For most practical problems the number of passes is rarely larger than five. Because of this, we can use an exponential algorithm that tries the possible directions in a clever way.

We present an algorithm that works with partial solutions. Assume we are given a partial solution with a vector of directions. If this solution contains all (attribute,element) pairs, we have found a correct solution. If not, we have to find a clever extension. We do this by calculating the two sets of (attribute,element) pairs that could be evaluated by extending the vector of pass directions with the two possible directions.

The nature of the SMP-problem is such that an extension does not change the partition that has been found for the preceding passes. In other words, a change in the direction of a certain pass does not have an effect on the maximum sets than can be found for all previous passes.

So we look at the two sets of (attribute,element) pairs that can be evaluated in the following pass for both directions. The following cases can occur:

If they turn out to be empty, there is no use in searching further, and we have to conclude that there is no possible solution beginning with this partial solution. It can be proven that once this happens, there is no solution at all.
If it turns out that the sets are equal, then it does not matter which direction is taken, and we proceed with the new partial solution that we have found.
If it turns out that one of the two sets is a subset of the other, we choose the direction which gives the largest subset. It is not useful to try the direction which gave a smaller extension because it will never result in a better solution than the one we found using the larger one. This can also be proven. Again, we have found a larger partial solution.
Only if it turns out that neither set is a subset of each other, do we have to try both new partial solutions that we found. In most practical problems this does not happen very often, which again makes the exponential character of the SMP-problem even less of a problem.

If we have a function that returns the largest possible set of (attribute,element) pairs for a given direction, then we can define a function that finds all "optimal" solutions. The solutions are represented as a two-tuple consisting of a vector of pass directions and complete ordered partition. If the dependency graph does not have the SMP-property then the result will be equal to the empty set. We give the formal description of the function that performs the searching process, and returns the solutions:

df  find all SMP sol((A,P,L,R))
    =  find(<>,<>)
    where
     find(<d₁,..,d_n>,<A₁,..,A_n>)
     = if   {A₁,..,A_n} is a complete partition of A
       then { (<d₁,..,d_n>,<A₁,..,A_n>) }
       elif Next R union Next L = empty
       then empty
       elif Next R = Next L
       then find(<d₁,..,d_n,b>,<A₁,..,A_n,Next L>)
       else    if   Next L not subset Next R
               then find(<d₁,..,d_n,l>,<A₁,..,A_n,Next L>)
               else empty
               fi
            union  if   Next R not subset Next L
               then find(<d₁,..,d_n,r>,<A₁,..,A_n,Next R>)
               else empty
               fi
       fi
       where  Next R
              = find next SMP((A,P,L,R),A/union{A₁,..,A_n},r)
           ,  Next L
              = find next SMP((A,P,L,R),A/union{A₁,..,A_n},l)
       endwhere
     endwhere

In this definition the sets Next L and Next R represent the two possible extensions for a left-to-right and right-to-left pass, respectively. We do not prove that the function find all SMP sol really does find all solutions.

3.2.3 How to find a maximum extension

In this subsection we deal with the function that finds the largest possible extension (subset of the remaining (attribute,element) pairs) for a partial solution and a pass direction. This function is defined by:

df  find next SMP(<A,P,L,R>,A',d)
    =  A' / {  c in A'
            |  a,b in A'
             and (a,b) in P/D(d)
             and (b,c) in P*
            }
       where  D(l) = L
           ,  D(r) = R
       endwhere

This function searches for problem dependencies that cannot be evaluated in the given pass direction, and remove all pairs that depend on it. It is obvious that the solution found in this way is correct and is also the maximum solution.

3.2.4 The final solution

Finally, we give the definition of the SMP solution (if it exists). In the following definition we make use of the function best that chooses the best solution from a set of solutions. (For example, the best solution may have the fewest passes, the most left-to-right passes, or begin with a left-to-right pass.) The definition is:

df  SMP solution
    =  best(find all SMP sol(dependency graph))

3.3 Processing the solution

We can define a function pass that returns the pass number for each (attribute,element) pair in the solution:

for all  A in all attr elem pairs
  df  pass(A) = i
      where  A in A_i
          ,  <dvec,<A1,..,An>> = SMP solution
      endwhere

We can also define the one-pass property for (attribute,element) pairs. An (attribute,element) pair has this property when all other pairs that make use of this pair are evaluated in the same pass as the given pair. This means that the pair does not have to be stored in the tree to be transferred to a later pass. Pairs satisfying the one-pass property can be implemented by a stack that is controlled by the evaluation mechanism. We define a function that determines this property, and the two sets in which all pairs can be divided using this property:

for all  AE in all attr elem pairs
  df  one pass(AE)
      =  for all  (AE,AE') in P
            (pass(AE) = pass(AE')
         where  (A,P,L,R)
                =  dependency graph
         endwhere

df  one pass attr elem pairs
    = {  AE in all attr elem pairs
      |  one pass(AE)
      }

df  multi pass attr elem pairs
    = all attr elem pairs / one pass attr elem pairs

Chapter 4

Implementation

In this chapter we describe how the sets, relations, functions, and restrictions defined in the previous chapters are implemented. We also describe the extra restrictions that the implementation imposes on the input. As far as possible we keep the same order as the description in the previous chapters.

It is impossible to give an exhaustive description of the entire implementation. The complete documentation of the source and the source in VAX-Pascal itself can be found in [Faase89]. All references in the rest of this chapter refer to this report, and consist of a module name, optionally followed by a section number.

4.1 Identifiers

The implementation uses a binary identifier tree, in which all identifiers are stored. There are different kinds of identifiers. For each identifier kind, specific information, derived from the input, is attached to the identifiers of this kind. Each identifier can only belong to one kind. This imposes the extra restriction that the sets classes, nodes, type identifiers, semantic functions and attributes are pairwise disjoint.

4.2 Classes and nodes

The sets classes and nodes are represented as the subset of all identifiers with the identifier kinds n class and n nodes [bintree 1.10], respectively. Since all kinds of identifiers are disjoint, these sets are disjoint.

The set elements is represented as the subset of all identifiers with the identifier kinds n class, n node and n elem. The identifier kind n elem is used for identifiers that are not enumerated in the class and node type definitions in the input, but are used in the rest of the input as an element. Whenever a class definition is parsed with an identifier of the kind n elem as its left-hand side, then the kind of this identifier is changed into n class [parser 13.1].

4.3 Types

The set type identifiers is represented as the subset of all identifiers with the identifier kind n type [bintree 1.10]. The standard type identifiers are initially stored in the binary tree [bintree 1.10]. They are dealt with in the same way as the user defined type identifiers. To see how the user defined type identifiers are parsed, refer to [parser 5]. For a further description of the internal representation of these (formal) types, see [bintree 1.9], and for a description of how they are manipulated, see [errors 4].

4.4 Semantic functions

The set semantic functions is represented as the subset of all identifiers with the identifier kind n func. The function type of is represented by the record field type of func in the variant part of each identifier of this kind [bintree 1.10.4]. The record field nr of args represents the function number of args of. The value of type arg(i,F) is implemented as the ith element of the linked list of type pltype [bintree 1.9]. The record field args points to the head of the list.

The parser tests the restriction that each semantic function identifier has a unique definition when the semantic function definitions are read [parser 6].

4.5 Attributes

The set attributes is represented as the subset of all identifiers with the identifier kind n attr.

The function type of is represented by the record field type of attr in the variant part of each identifier of this kind [bintree 1.10.3]. The sets synthesized attributes and inherited attributes are stored in the global variables g syn attr and g inh attr [bintree 2.3] of type attributes [bintree 1.3]. The parser tests whether an attribute is inherited or synthesized by using the procedures inherited attr and synthesized attr [parser 15.1], which operate on these global variables.

The function attr of is represented by the element nor gen of the array attr in the variant part of each element identifier [bintree 1.8].

The restriction that attributes have to be defined consistently is tested when the attribute definitions are read [parser 7].

4.6 Input and output attributes

During the parsing pass, the variables g input attr and g output attr [bintree 2.3] of type attributes [bintree 1.3] are used to represent all attributes that are defined as input and output attributes, respectively, for all elements (without using the "AT"-construction, as described in section 2.5). The information about which attributes are defined as input and output attributes with a specific element is stored in the elements nor in and nor out of the array attr in the variant part of this element [bintree 1.8,1.10.2].

The functions input attr of, output attr of and interface attr of are determined by first testing whether the variables g input attr and g output attr have the required property for all the elements, and if not, by testing the elements nor in and nor out of the array attr.

After the second pass, the field attr is updated so that these functions are fully represented by this field. This is done by the procedure gen all attributes [pass2 5].

The parser tests whether the input and output attribute definitions satisfy the restrictions when they are read [parser 8].

4.7 Rules

The rules are stored with the element identifiers in the binary tree. The sets rules, class definitions and tree rules are not actually represented in our implementation.

4.7.1 Class definitions

The restrictions on the class definitions are checked when a definition is read [parser 9.1]. The function class is represented by the record field class rule in the variant part of the element identifiers [bintree 1.10.2]. It is filled when the class definition is read [parser 9.1]. The record field clos class represents the function clos class, and is constructed by the procedure test deterministic in the second pass [pass2 2].

The function parents of is represented by the record field parents, which is updated when class definitions are read [parser 9.1].

The set all elements in class is not directly represented in the implementation. The restriction that recursive class definitions are not allowed, is tested by the procedure c in clos c or and, when class definitions are processed. In the implementation, an approach is chosen which differs from the solution suggested by the formulation of the restriction.

The procedure test all classes defined [pass2 1] checks in the second pass whether there is a class definition for each class.

4.7.1.1 Deterministic class definitions

A restriction that the embedding of classes is unambiguous (or deterministic) can be formulated. This restriction says that there is always a unique way of reaching a node type by successive applications of class definitions. This restriction can be described by:

for all  C in classes
   ( for all  C₁,C₂ in (class(C) intersection classes)
        (C₁ != C₂ => clos class(C₁) disjoint clos class(C₂))
   )

We do not impose this restriction on the input. The procedure test deterministic [pass2 2] tests whether this restriction is fulfilled or not. A message whether the restriction is satisfied or not is given to the user.

This restriction is important with respect to a remark about attributes in subsection 2.11.5, saying that whenever there is class, and a node type that has an ambiguous derivation using the definitions from this class, then the attributes with this node type are distinguished depending on the derivation that has been chosen.

4.7.2 Tree rules

The restrictions on the tree rules are checked when they are read. The set elements with tree rule is the subset of all elements that have the value r dire for the record field kind of rule in the variant part of the element identifiers [bintree 1.10.2].

The function tree is represented in two different ways in the implementation. The first way is as a linked list of type ptree rule. The record field tree rule points to the head of the list. Each element of this linked list represents a part of the right-hand side of the rule, consisting of a part name and an element. In the second way the function tree is represented by the record field tree list, which points to an array of type t tree list. Each element of this array represents a part with its part name, element and (transformed [bintree 1.6]) assignments to attributes of this part. The first element, with index 0, is used for the main part and main element.

The function attr assignments of is represented in the record field attr of type plattr ass [bintree 1.5] in the variant part of element identifiers [bintree 1.10.2]. In the second pass the attribute assignments are transformed and also stored in a different structure of type ttr ass [bintree 1.6]. See also the subsection 4.12.3 "Consistency restrictions" of this chapter.

4.7.2.1 Consistent definition of tree productions

The set elements with indirect tree rule is represented as the subset of all element identifiers with the value r indi for the record field kind of rule. The set elements with tree production is represented by the subset of all element identifiers for which this field has the value r dire or r indi.

The test whether the function tree production of can be defined correctly is performed when the rules are read. Every time a class definition or a tree rule has been read the procedure test consistency [parser 9.3] is called. This procedure updates the record fields tree rule and kind of rule in such a way that after all rules have been read the function tree production of is defined correctly for all elements in the set elements with tree production. The function tree production of will be represented by the record field tree rule after all the rules are read. Whenever an inconsistency is detected, such that this function cannot be defined correctly, errors are reported by the procedure test consistency.

4.8 Root

The root is represented in the global variable root of type pnode [bintree 2.2]. The root is read by the procedure read root [parser 16].

4.9 Well-formed grammars

An abstract program tree grammar is well formed if the reachability property and the termination property are satisfied. The reachability property is checked by the procedure test reachability [pass2 3]. This procedure has one argument of enumeration type tkind reach indicating the kind of reachability that is tested. The complete reachability is tested when the value r comp is given, and the concrete reachability is tested when the value r conc is given. If the reachability property is not satisfied, error messages are generated. Both kinds of reachability are tested in the implementation by calling the procedure twice with the two different values as argument [main 4.1].

4.10 Flat implementability

The sets left hand side elements and right hand side elements are represented by the variables left side elements and right side elements of type elements [bintree 1.2,2.2], respectively. The sets are calculated by the function flatimplementable [depend].

The flat implementability restriction is implemented in the function flatimplementable, which returns TRUE if the restriction is satisfied, otherwise FALSE.

4.11 Assignment of attributes using class rules

The restriction that an attribute cannot be associated both with a class and an element in the class hierarchy of that class, and the restrictions that, if an element belongs to more than one class, then the attributes from these classes should be the same and have the same input/output properties, are tested by the procedure gen all attributes. This procedure also updates the record field attr of the element identifiers, so that the elements of this array represent the functions attr of, input attr of, output attr of, all attr of, all input attr of and all output attr of as described in [bintree 1.8]

4.12 Restrictions on the semantic rules

Some of the restrictions described in section 2.11 are tested when the semantic rules are parsed during the first pass. The remaining restrictions are tested during the second pass when all the class definitions are known. During the second pass the semantic rules are also transformed when the consistency restriction described in subsection 2.11.5 is checked [trans]. The transformed semantic rules are used in the generation pass.

The first pass tests the restrictions that do not depend on the entire set of class definitions. We therefore call them typing restrictions. The rest of the restrictions can be divided into the restrictions that are concerned with the visibility of the attributes in the selection constructions (visibility restrictions), and into the restrictions that determine whether all the semantic rules associated with a tree rule assign all the attributes in precisely one way (consistency restrictions).

Refer to Faase [Faase86] for a description of a set of restrictions which is subdivided into typing and visibility restrictions. This is equivalent to the set of restrictions as described in section 2.11

4.12.1 Type restrictions

The following restrictions (aspects) are tested in the first pass:

The type correctness. Each expression has a certain type which is determined by the attributes and semantic functions within the expression. The type of the expression in each simple attribute assignment should be the same as the type of the assigned attribute. (Aspect 1 in section 2.11.3)
Whether the part names that are used are in the part list. (Aspect 2 in section 2.11.3 and aspect 1a in section 2.11.4)
Whether applied and assigned attributes are used only in the expressions and simple assignments, respectively. (Aspect 4 in section 2.11.3 and 3 in section 2.11.4)
The correct usage of the functions. The functions that are used in the expressions must be defined and have to be applied according to their definition, which means that they contain the correct number of arguments, and these arguments should be of the correct type. (Aspect 5 in section 2.11.4)
Whether only correct elements are used within selectors, and whether the selectors of a case construction are disjoint. (Part of aspect 7 in section 2.11.3)
Whether the part name of a selective assignment is the only part name used within the assignments of this selective assignment. (Aspect 1b in section 2.11.4)

These restrictions are checked by the procedures that read the expressions, the selection constructions and the attribute assignments. For a full description see [parser 10,11].

4.12.2 Visibility restrictions

The following restrictions (aspects) are checked in the second pass:

The visibility of the attributes that are used with a certain part name. This check can be divided into the following checks:

On the top level of the attribute assignments those attributes can be assigned that are associated with the element of the part name or with its parents in the class hierarchy. (Aspect 2a in section 2.11.4)
Within the selective assignments only those attributes can be assigned that are associated with the current element of the part name in the part list. (Aspect 2b in section 2.11.4)
Only the attributes that are associated with the current element of a part name or its parents in the class hierarchy can be used in expressions. (Aspect 3 in section 2.11.3)

Whether the selective constructions are applied to classes only. (Aspect 6 in section 2.11.3 and aspect 4 in section 2.11.4)
Whether the elements used in a selector are elements of the class with the selection constructions. And also for case expressions, whether all the elements of the class are used in the selectors of the case expressions. (Part of aspects 7 and 8 in section 2.11.3 and aspect 5 in section 2.11.4)

These restrictions are tested during the transformation of the attribute assignments in the second pass. For a full description see [trans]

4.12.3 Consistency restrictions

The set returned by the function must assigned can also be seen as a tree structure for each part name, in which the edges are labelled with the elements, and the nodes contain different subsets of attributes. In our implementation we represent these trees with a pointer structure, to which for each attribute the associated attribute assignment is attached, according to the original semantic rules.

Each node in this structure is represented by the type ttr ass, that also contains a pointer to a list of edges, which are represented by the type tltr ass. The labelling of the edges with elements is not represented explicitly but can be derived uniquely using the class definitions and an ordering of the elements. For a further description see [bintree 1.6]

The function assigned in is not explicitly represented in the implementation. The restriction described using this function is implemented by a three step algorithm. In the first step the empty structure, as described above, is generated by the procedure gen empty tree [trans]. In the second step the attribute assignments are scanned, during which the restrictions described in the previous section are tested, and each time when a simple assignment is found, it is inserted into the tree. An error message is generated if this place was already filled, indicating that an attribute is assigned in more than one way. This second step is performed by the procedure tr lattr ass [trans]. In the third step, error messages are generated for all empty spots in the tree, indicating that a certain attribute was not assigned. This is done by the procedure test complete [trans].

4.13 Dependency graph

The set all attr elem pairs is represented in the local variable node of type nodes in the Simple Multi-Pass module [allsmp]. This variable is an array in which each element represents one (attribute,element) pair. The set is represented as an ordered set, in which the edges are represented using the index of each pair. This variable represents the element A in the 4-tuple that represents the dependency graph, see subsection 3.1.3

The set all attr elem pairs is constructed by the procedure make numbering for pairs [depend], which uses the procedure add a node [allsmp]. This procedure assigns a number to each (attribute,element) pair in the set. The function number of pair [depend] returns the number of each (attribute,element) pair in this set, as assigned by the procedure make numbering for pairs.

The function attr elem pairs of is implemented in the function projection [depend], which returns a set of elements instead of a set of pairs (as in the formal description), because the attribute name of all pairs is the same. The formal definition of the result of the function projection is described by:

df  projection(A,E)
    =  if   E in elements with indirect tree rule
       then if   A in attr of(E)
            then { E }
            else projection(A,E')
                 where E' in parents of(E)
                 endwhere
            fi
       else clos class or and(E) intersection left hand side elements
       fi

Notice that if an element has an indirect tree rule, then it is in precisely one class, otherwise the function tree productions could not have been calculated.

4.13.1 Dependencies in the semantic rules

The function directions is implemented by the function direction [depend]. The set dependencies is indirectly represented in the variables succ, Lpred and Rpred of type trel node [allsmp]. These variables represent the elements P, L and R in the 4-tuple that represents the dependency graph (subsection 3.1.3). These variables are filled by calling the procedure add a dependency [allsmp].

The set dependencies is constructed by the procedure generate all dependencies [depend]. The function dep in AAS is implemented in the procedure dep lattr ass, and the function dep in EXPR is implemented in the procedure dep expr, which makes use of the procedure generate dependencies [depend]. This last procedure calculates the set of dependencies for the attribute-occurrence alternative in the case expression of the definition of the function dep in EXPR, see subsection 3.1.2

4.14 The smp algorithm

The SMP algorithm is implemented within the boolean function smp algorithm [allsmp], which returns TRUE if a solution exists, otherwise FALSE. If a solution exists, this procedure returns the best solution for the arguments with which it was called.

The solution consists of the pass directions and in which pass the attributes have to be evaluated. The global variable passes contains the number of passes. The global variables part dir and partition represent the vector of pass directions and the complete ordered partition of the attributes, respectively.

The set one pass attr elem pairs is represented by the global variable one pass. This set is also calculated by the function smp algorithm, if a solution is found.

4.15 Generation

The program can be generated if all the previous restrictions are fulfilled. Code generation is implemented in the procedure gen program [genera].

References

[Alblas81]	H.Alblas : "A characterization of attribute evaluation in passes", Acta Informatica 16 (1981), pp. 427-464.
[Alblas85]	H.Alblas : "Finding minimal pass sequences for attribute grammars", SIAM Journal on Computing 14, (1985), pp. 889-914.
[Eng84]	J.Engelfriet : "Attribute grammars: Attribute evaluation methods", In: Methods and Tools for Compiler Construction, Cambridge University Press (1984), pp. 103-138.
[Faase86]	F.J. Faase : "Een attribuut evaluator generator", Masters Thesis, Dept. of Computer Science, Univ. of Twente, The Netherlands.
[Faase89]	F.J. Faase : "Documentation and source of the AAPT-evaluator generator", (to be published as an internal technical report), Dept. of Computer Science, Univ. of Twente, The Netherlands.
[RäUk80]	K.-J.Räihä & E.Ukkonen : "On the optimal assignment of attributes to passes in multi-pass attribute evaluators", Proc. 7th Int. Colloquium on Automata, Languages and Programming, Lecture Notes in Computer Science 85, Springer-Verlag, (1980), pp.500-511.
[RäUk81]	K.-J.Räihä & E.Ukkonen: "Minimizing the number of evaluation passes for attribute grammars", SIAM Journal on Computing 10, (1981), pp. 123-130.

My life as a hacker | My home page

Specification and Implementation of an Attributed Abstract Program Tree Evaluator Generator