1. Propertied abstract trees

1.1 Introduction

In this document we will describe Propertied Abstract Trees (PAT), as objects which can be used for representing form and contents of abstract trees. We want to use PAT to describe internal representation of program trees in the different phases of the compilation process. The abstract trees are not arbitrary but do contain allot of structure.

1.2 Nodes of the trees

Trees are made out of nodes that are hierarchical connected with each other. With each node we thus can associate a (ordered,empty) set of nodes. With each node we will also associate a node type. It is possible to describe the extra structure in the trees we want to consider by putting restrictions on the form of a node that has a certain node type.

We also want to store information in the tree separate from its form. We do this by associating properties with the nodes. The properties with a node are represented as a set of (name,value)-pairs. All the properties together describe the contents of a tree in the same way as the form of the tree is described by the hierarchical ordering of the different nodes of the tree. Of course it is also possible to structure the sets of properties with a node depending on its node type.

1.3 Representation of trees

In this section we will describe a universal representation for the trees we want to consider. In the following section we will explain we can filter out those trees that has the desired extra structure.

We can describe trees as a root node with a number (possible zero) of subtrees. With the above we come to the conclusion that we can describe a tree as a tree-tuple of a node type, the properties with the node, and a (possible empty) set of subtrees. Given a set of node types, properties and property values we can give a formal description of the universal set of trees:

df  all trees
    = {  (N,{(P₁,V₁),..,(P_m,V_m)},(t₁,..,t_n))
      |  N in node types
       and 0<=m
       and for all  1<=i<=m
            (  P_iin properties
             and V_iin property values
            )
       and 0<=n
       and for all  1<=i<=n  (t_iin all trees)
      }

1.4 How to add structure

We will use one of the simplest ways of restricting the number of trees that we allow. For each node in the tree its node type shall restrict the node types of its sons and define the set of properties. Each node with a node type has a fixed number of sons, and the node type of each sons has to be in a fixed subset of the node types.

We can describe these restrictions by giving tree rules for each node type that specifies the number of son nodes and a so-called class. With class definitions we define which node types are in a certain class.

In the following chapter we will present a way to describe these restrictions.

2. SYNTAX OF PAT

2.1 Introduction

In this chapter we describe the syntax of the grammar with which Propertied Abstract Trees can be described.

2.2 Tree rules and class definitions

The grammar that specifies the tree rules with the node types and the class definitions with the classes is the following:

element   : class ; node type .

rule      : ( class def ; tree rule ) , . .

class def : class , = , { , node type LIST , } .

tree rule : node type , => , (part name , : , element)LIST .

class     : identifier .

node type : identifier .

part name : identifier .

2.3 Properties

The properties are defined by the following grammar:

property definition
         : property name , : , type identifier
         , OF , element LIST , .
         .

property name : identifier .

type identifier
         : INTEGER ; REAL ; Boolean ; CHAR
         ; user defined type identifier
         .

user defined type identifier : identifier .

2.4 Extensions

In the most simple form only classes are allowed in the right hand side of the tree rules, is there no overlap between the node types in the different class definitions and are properties only allowed at node types. In this section we describe a number of extensions on this simple form. Because these extensions are considered to be useful, but can have extensive consequences, we present a mechanism for the user to tell which extensions he wishes. The idea behind this is that it is possible to make separate implementations which do not support some or all of the extensions. Each extensions has implications on the semantic restrictions on the rest of the abstract tree grammar description. We present the grammar of the extensions:

extension
         : node types in tree rule
         ; class overlap
         ; class properties
         .

In each of the following sub-sections we will describe each extension.

2.4.1. Node types in tree rules

This extension allows node types to be valid in the right hand side of tree rules, besides the classes. This extension is useful in those cases where a class exists of a single node type, which does happen often at the leaves of the abstract tree.

2.4.2. Class overlap

This extension states that a node type can be in different class definitions, instead of just one. This extension is useful because it can happen that two almost identical constructs can occur in two different classes that one would like to put together for sake of briefness and clarity.

2.4.3. Class properties

This extension says that instead of properties only being defined at node types, may also defined with classes (implying that they are now defined with all node types in that specific class). This extension is useful because in some cases one wants to define a property that says something of a specific class instead of the separate node types.

2.5 De complete grammar

Finally we present the complete grammar with which Propertied Abstract Trees can be described. This grammar just consists of the elements described before, together with enumerations of the classes, node types and user defined type identifiers. And the so-called root is defined. The complete grammar is:

input grammar
        : ( EXTENSIONS , extension LIST , . )OPTION
        : ( CLASSES , class LIST , . )OPTION
        ,  NODE TYPES , node type LIST , .
        , ( TYPES , user defined type identifier LIST , .
          )OPTION
        ,  PROPERTIES , property definitions SEQ
        ,  RULES , rule SEQ
        ,  ROOT , element
        .

3. SEMANTICS OF PAT

In this chapter we give a formal description of all the semantic restrictions that are imposed on the input grammar. We shall do so by extracting certain sets from parts of the input. With these sets we will describe the restriction. Further more we will derive other sets from these sets that describe certain semantic qualities of the given grammar.

3.1 The extensions

The first thing we come along in the input are the extensions:

EXTENSIONS  ext₁ , ... , ext_n  .

We construct the set of given extensions, and define the Boolean constants that say whether a extension is given or not:

df extensions  = {ext₁,..,ext_n}

df node types in tree rules
   = node types in tree rules in extensions
df class overlap
   = class overlap in extensions
df class properties
   = class properties in extensions

3.2 The classes and nodes

The following thing we find is the enumeration of the classes and the node types. Given this part of the grammar:

CLASSES  C₁ , ... , C_n  .

NODE TYPES  N₁ , ... , N_m  .

We define the sets classes and nodes with:

df  classes = {C₁,..,C_n}
df  nodes   = {N₁,..,N_m}

Because an identifier can not represent a class and a node type the sets have to be disjoint:

classes  disjoint  nodes

We also would like to define the set of elements as the union of these two sets:

df elements = classes union nodes

3.3 The user defined types

The next thing we get in the grammar is the enumeration of the user defined type identifiers. First of all we define the four standard types:

df  standard type identifiers
    = {INTEGER, REAL, Boolean, CHAR}

The next part of the grammar that gives the enumeration looks like:

TYPES  T₁ , ... , T_n  .

The set user defined type identifiers is defined by:

df  user defined type identifiers = {T₁,..,T_n}

Because the standard type identifiers may not be redefined, we get the following restriction:

          user defined type identifiers
disjoint  standard type identifiers

Now we can define the set type identifiers as the union of the standard and user defined type identifiers:

df  type identifiers
    =   standard type identifiers
      union user defined type identifiers

3.4 The property

The following part of the grammar we come along is the definition of the properties, which is given by:

PROPERTY  propdef₁ .  ...  propdef_n .

The set property definitions is derived from the grammar in the following way:

df  property definitions  =  {propdef₁,..,propdef_n}

The definition of a property consists of two parts : a type definition which defines the type of the property and a set of elements at which the property is defined with. This set contains node types only when the class properties-extension is not given. This results in the following restrictions on the property definitions:

for all  "P : T OF E₁ .. E_n" in attribute definitions
  (  T in type identifiers
   and {E₁,..,E_n} subset elements
   and not class properties
     => {E₁,..,E_n} subset nodes
  )

The set properties is defined as the set of all property identifiers that occur in the property definitions:

df  properties
    = {  P
      |  "P : T OF E₁ .. E_n" in property definitions
      }

The restrictions that property definition has to be unambiguous is give by:

for all  P  in  properties
  ( exists!  "P : T OF E₁ .. E_n" in property definitions)

When this restriction is satisfied we can define the function type of by:

for all  "P : T OF E₁ .. E_n" in property definitions
  df  type of(P) = T

Finally we present the definition of the function prop of that returns the properties that are defined with an element:

for all  E  in  elements
  df  prop of(E)
      = {  P
        |  "P : T OF .. E .." in property definitions
        }

3.5 The rules

The greatest part of the grammar description consists of the rules that describe the actual tree grammar. These rules can are divided into the class definitions and the tree rules. In the grammar description the two kinds of rules are intermixed, and the tree rules can be followed by their attribute assignments (which we will neglect for the time being). When the grammar looks like:

RULES  rule₁ .  ...  rule_n .

The following set can be defined:

df  rules = {rule₁,..,rule_n}
df  class definitions
    = {  "C = {N₁,..,N_n}" in rules  }
df  tree rules
    = {  "N => P₁:E₁,..,P_n:E_n" in rules }

First the two kinds of rules will be dealt with separately in two successive sections. In the following sections overall consistency restrictions will be described.

3.6 The class definitions

Class definitions can only be defined with classes and have to be defined as a set of node types. Which is described by:

for all  "C = {N₁,..,N_n}" in class definition
  (  C in classes
  and  {N₁,..,N_n} subset nodes
  )

For each class that has to be exactly one class definition. resulting in the restriction:

for all  C in classes
  ( exists! "C = {N₁,..,N_n}" in class definitions)

When this restriction is fulfilled we can define the function class that returns the set of node types by which the class is defined:

for all  "C = {N₁,..,N_n}" in class definitions
  df  class(C) = {N₁,..,N_n}

We also define the set of all node types that do occur in a class definition:

df  all node types in class
    = {  E  |  E in class(C)  and  C in classes  }

At last we define the function parents of that returns all the classes in which a node type is defined:

for all  N in node types
  df  parents of(N) = {  C in classes  |  N in class(C)  }

The extra restriction which occurs when the extension class overlap is not given is represented by:

not class overlap
=> for all  N in node types in classes
     ( parents of(N) = {C} )

3.7 The tree rules

Now we describe the restrictions on the set of tree rules. In each tree rule the part names have to be unique and the parts have to be elements, even classes when the extension node types in tree rules is not active. This is described by:

for all  "N => P₁:E₁,..,P_n:E_n" in tree rules
  (  E in elements
   and {E₁,..,E_n} subset elements)
   and not node types in tree rule
     => {E₁,..,E_n} subset classes)
   and for all 1<=i<j<=n (P_i != P_j )
  )

For each node type there is precisely one tree rule, which is expressed by:

for all  N  in  nodes
  (exists!  "N => P₁:E₁,..,P_n:E_n" in tree rules )

When this restriction is fulfilled we (again) can define function, called tree, to bring the information on higher abstraction level:

for all  E in nodes
  df  tree(N)
      = ((P₁,E₁),..,(P_n,E_n))
        where
          "N => P₁:E₁,..,P_n:E_n" in tree rules
        endwhere

Remark: This function returns a ordered set, so that the information of ordering of the parts is preserved for further uses.

3.8 The root

The root element is defined in the following way in the grammar description:

ROOT  root

We define root with:

df  root = root

The only restriction on the root is that it has to an element:

root in elements

3.9 Assignment of properties through class definitions

In this section we will discuss some restrictions with respect to the properties and the class definitions. These restrictions are only necessary with the extension class properties, otherwise the restriction we will describe do not put extra restrictions on the grammar description. When we allow properties at classes we have to define how they are interpreted. We start with the definition of the function all prop of, that returns all the properties assign to an element, with respect to the class definitions:

for all  N in nodes
  df  all prop of(N)
      = prop of(N)
        union  union{  prop of(C)  |  N in class(C)  }

It is not allowed to define a property with a class and a node type in this class in the grammar description. We call this a parent-child conflict. The extra restriction is described by:

for all  C in classes, N in class(C)
  (prop of(C)  disjoint  prop of(N))

When a node type belongs to more than one class (this is only possible with the class overlap extension), the sets of properties defined with these classes should in the grammar description should be equal. If this is not the case we have a so-called parent-parent conflict. The extra restriction we get is described by:

for all  N in elements, C₁,C₂ in parents of(N)
  (prop of(C₁) = prop of(C₂))

4. TREES WITH A PAT

In this final chapter we shall give a function that with the grammar description of the previous chapters returns the set of all possible Properties Abstract Trees with this grammar.

First of all we define the function trees from that returns for each element all the trees that start from that element:

for all  C in classes, N in node types
df  trees from(C)
    = {  trees from(N)  |  N in class(C)  }
 ,  trees from(N)
    = {  (N,{(P₁,V₁),..,(P_m,V_m)},(t₁,..,t_n))
      |  {P₁,..,P_m} = all prop of(N)
       and for all  1<=j<=m  (V_j in domain of(P_j))
       and for all  1<=j<=n  (t_i in trees from(E_i))
         where  ((P₁,E₁),..,(P_n,E_n)) = tree(N)
         endwhere
      }

In this definition we make use of the function domain of which we consider as given, and representing the value domain of a certain property.

Now we can define the function that returns all trees starting with the root:

df  all trees  =  trees from(root)

My life as a hacker | My home page