In this document we will describe Propertied Abstract Trees (PAT), as objects which can be used for representing form and contents of abstract trees. We want to use PAT to describe internal representation of program trees in the different phases of the compilation process. The abstract trees are not arbitrary but do contain allot of structure.
Trees are made out of nodes that are hierarchical connected with each other. With each node we thus can associate a (ordered,empty) set of nodes. With each node we will also associate a node type. It is possible to describe the extra structure in the trees we want to consider by putting restrictions on the form of a node that has a certain node type.
We also want to store information in the tree separate from its form. We do this by associating properties with the nodes. The properties with a node are represented as a set of (name,value)-pairs. All the properties together describe the contents of a tree in the same way as the form of the tree is described by the hierarchical ordering of the different nodes of the tree. Of course it is also possible to structure the sets of properties with a node depending on its node type.
In this section we will describe a universal representation for the trees we want to consider. In the following section we will explain we can filter out those trees that has the desired extra structure.
We can describe trees as a root node with a number (possible zero) of subtrees. With the above we come to the conclusion that we can describe a tree as a tree-tuple of a node type, the properties with the node, and a (possible empty) set of subtrees. Given a set of node types, properties and property values we can give a formal description of the universal set of trees:
df all trees = { (N,{(P1,V1),..,(Pm,Vm)},(t1,..,tn)) | N in node types and 0<=m and for all 1<=i<=m ( Pi in properties and Vi in property values ) and 0<=n and for all 1<=i<=n (ti in all trees) }
We will use one of the simplest ways of restricting the number of trees that we allow. For each node in the tree its node type shall restrict the node types of its sons and define the set of properties. Each node with a node type has a fixed number of sons, and the node type of each sons has to be in a fixed subset of the node types.
We can describe these restrictions by giving tree rules for each node type that specifies the number of son nodes and a so-called class. With class definitions we define which node types are in a certain class.
In the following chapter we will present a way to describe these restrictions.
In this chapter we describe the syntax of the grammar with which Propertied Abstract Trees can be described.
The grammar that specifies the tree rules with the node types and the class definitions with the classes is the following:
element : class ; node type . rule : ( class def ; tree rule ) , . . class def : class , = , { , node type LIST , } . tree rule : node type , => , (part name , : , element)LIST . class : identifier . node type : identifier . part name : identifier .
The properties are defined by the following grammar:
property definition : property name , : , type identifier , OF , element LIST , . . property name : identifier . type identifier : INTEGER ; REAL ; Boolean ; CHAR ; user defined type identifier . user defined type identifier : identifier .
In the most simple form only classes are allowed in the right hand side of the tree rules, is there no overlap between the node types in the different class definitions and are properties only allowed at node types. In this section we describe a number of extensions on this simple form. Because these extensions are considered to be useful, but can have extensive consequences, we present a mechanism for the user to tell which extensions he wishes. The idea behind this is that it is possible to make separate implementations which do not support some or all of the extensions. Each extensions has implications on the semantic restrictions on the rest of the abstract tree grammar description. We present the grammar of the extensions:
extension : node types in tree rule ; class overlap ; class properties .
In each of the following sub-sections we will describe each extension.
This extension allows node types to be valid in the right hand side of tree rules, besides the classes. This extension is useful in those cases where a class exists of a single node type, which does happen often at the leaves of the abstract tree.
This extension states that a node type can be in different class definitions, instead of just one. This extension is useful because it can happen that two almost identical constructs can occur in two different classes that one would like to put together for sake of briefness and clarity.
This extension says that instead of properties only being defined at node types, may also defined with classes (implying that they are now defined with all node types in that specific class). This extension is useful because in some cases one wants to define a property that says something of a specific class instead of the separate node types.
Finally we present the complete grammar with which Propertied Abstract Trees can be described. This grammar just consists of the elements described before, together with enumerations of the classes, node types and user defined type identifiers. And the so-called root is defined. The complete grammar is:
input grammar : ( EXTENSIONS , extension LIST , . )OPTION : ( CLASSES , class LIST , . )OPTION , NODE TYPES , node type LIST , . , ( TYPES , user defined type identifier LIST , . )OPTION , PROPERTIES , property definitions SEQ , RULES , rule SEQ , ROOT , element .
In this chapter we give a formal description of all the semantic restrictions that are imposed on the input grammar. We shall do so by extracting certain sets from parts of the input. With these sets we will describe the restriction. Further more we will derive other sets from these sets that describe certain semantic qualities of the given grammar.
The first thing we come along in the input are the extensions:
EXTENSIONS ext1 , ... , extn .
We construct the set of given extensions, and define the Boolean constants that say whether a extension is given or not:
df extensions = {ext1,..,extn} df node types in tree rules = node types in tree rules in extensions df class overlap = class overlap in extensions df class properties = class properties in extensions
The following thing we find is the enumeration of the classes and the node types. Given this part of the grammar:
CLASSES C1 , ... , Cn . NODE TYPES N1 , ... , Nm .
We define the sets classes and nodes with:
df classes = {C1,..,Cn} df nodes = {N1,..,Nm}
Because an identifier can not represent a class and a node type the sets have to be disjoint:
classes disjoint nodes
We also would like to define the set of elements as the union of these two sets:
df elements = classes union nodes
The next thing we get in the grammar is the enumeration of the user defined type identifiers. First of all we define the four standard types:
df standard type identifiers = {INTEGER, REAL, Boolean, CHAR}
The next part of the grammar that gives the enumeration looks like:
TYPES T1 , ... , Tn .
The set user defined type identifiers is defined by:
df user defined type identifiers = {T1,..,Tn}
Because the standard type identifiers may not be redefined, we get the following restriction:
user defined type identifiers disjoint standard type identifiers
Now we can define the set type identifiers as the union of the standard and user defined type identifiers:
df type identifiers = standard type identifiers union user defined type identifiers
The following part of the grammar we come along is the definition of the properties, which is given by:
PROPERTY propdef1 . ... propdefn .
The set property definitions is derived from the grammar in the following way:
df property definitions = {propdef1,..,propdefn}
The definition of a property consists of two parts : a type definition which defines the type of the property and a set of elements at which the property is defined with. This set contains node types only when the class properties-extension is not given. This results in the following restrictions on the property definitions:
for all "P : T OF E1 .. En" in attribute definitions ( T in type identifiers and {E1,..,En} subset elements and not class properties => {E1,..,En} subset nodes )
The set properties is defined as the set of all property identifiers that occur in the property definitions:
df properties = { P | "P : T OF E1 .. En" in property definitions }
The restrictions that property definition has to be unambiguous is give by:
for all P in properties ( exists! "P : T OF E1 .. En" in property definitions)
When this restriction is satisfied we can define the function type of by:
for all "P : T OF E1 .. En" in property definitions df type of(P) = T
Finally we present the definition of the function prop of that returns the properties that are defined with an element:
for all E in elements df prop of(E) = { P | "P : T OF .. E .." in property definitions }
The greatest part of the grammar description consists of the rules that describe the actual tree grammar. These rules can are divided into the class definitions and the tree rules. In the grammar description the two kinds of rules are intermixed, and the tree rules can be followed by their attribute assignments (which we will neglect for the time being). When the grammar looks like:
RULES rule1 . ... rulen .
The following set can be defined:
df rules = {rule1,..,rulen} df class definitions = { "C = {N1,..,Nn}" in rules } df tree rules = { "N => P1:E1,..,Pn:En" in rules }
First the two kinds of rules will be dealt with separately in two successive sections. In the following sections overall consistency restrictions will be described.
Class definitions can only be defined with classes and have to be defined as a set of node types. Which is described by:
for all "C = {N1,..,Nn}" in class definition ( C in classes and {N1,..,Nn} subset nodes )
For each class that has to be exactly one class definition. resulting in the restriction:
for all C in classes ( exists! "C = {N1,..,Nn}" in class definitions)
When this restriction is fulfilled we can define the function class that returns the set of node types by which the class is defined:
for all "C = {N1,..,Nn}" in class definitions df class(C) = {N1,..,Nn}
We also define the set of all node types that do occur in a class definition:
df all node types in class = { E | E in class(C) and C in classes }
At last we define the function parents of that returns all the classes in which a node type is defined:
for all N in node types df parents of(N) = { C in classes | N in class(C) }
The extra restriction which occurs when the extension class overlap is not given is represented by:
not class overlap => for all N in node types in classes ( parents of(N) = {C} )
Now we describe the restrictions on the set of tree rules. In each tree rule the part names have to be unique and the parts have to be elements, even classes when the extension node types in tree rules is not active. This is described by:
for all "N => P1:E1,..,Pn:En" in tree rules ( E in elements and {E1,..,En} subset elements) and not node types in tree rule => {E1,..,En} subset classes) and for all 1<=i<j<=n (Pi != Pj ) )
For each node type there is precisely one tree rule, which is expressed by:
for all N in nodes (exists! "N => P1:E1,..,Pn:En" in tree rules )
When this restriction is fulfilled we (again) can define function, called tree, to bring the information on higher abstraction level:
for all E in nodes df tree(N) = ((P1,E1),..,(Pn,En)) where "N => P1:E1,..,Pn:En" in tree rules endwhere
Remark: This function returns a ordered set, so that the information of ordering of the parts is preserved for further uses.
The root element is defined in the following way in the grammar description:
ROOT root
We define root with:
df root = root
The only restriction on the root is that it has to an element:
root in elements
In this section we will discuss some restrictions with respect to the properties and the class definitions. These restrictions are only necessary with the extension class properties, otherwise the restriction we will describe do not put extra restrictions on the grammar description. When we allow properties at classes we have to define how they are interpreted. We start with the definition of the function all prop of, that returns all the properties assign to an element, with respect to the class definitions:
for all N in nodes df all prop of(N) = prop of(N) union union{ prop of(C) | N in class(C) }
It is not allowed to define a property with a class and a node type in this class in the grammar description. We call this a parent-child conflict. The extra restriction is described by:
for all C in classes, N in class(C) (prop of(C) disjoint prop of(N))
When a node type belongs to more than one class (this is only possible with the class overlap extension), the sets of properties defined with these classes should in the grammar description should be equal. If this is not the case we have a so-called parent-parent conflict. The extra restriction we get is described by:
for all N in elements, C1,C2 in parents of(N) (prop of(C1) = prop of(C2))
In this final chapter we shall give a function that with the grammar description of the previous chapters returns the set of all possible Properties Abstract Trees with this grammar.
First of all we define the function trees from that returns for each element all the trees that start from that element:
for all C in classes, N in node types df trees from(C) = { trees from(N) | N in class(C) } , trees from(N) = { (N,{(P1,V1),..,(Pm,Vm)},(t1,..,tn)) | {P1,..,Pm} = all prop of(N) and for all 1<=j<=m (Vj in domain of(Pj)) and for all 1<=j<=n (ti in trees from(Ei)) where ((P1,E1),..,(Pn,En)) = tree(N) endwhere }
In this definition we make use of the function domain of which we consider as given, and representing the value domain of a certain property.
Now we can define the function that returns all trees starting with the root:
df all trees = trees from(root)
My life as a hacker | My home page