A
ParseTree
is a sophisticated data type that provides an interface
to the Yoix, XML and DTD parsers that were all built by the Yoix
team using the JavaCC parser generator.
Some of what you need to know to made good use of trees is documented here,
but an important piece that is currently missing is documentation
that describes how the different parsers build trees.
We hope to provide the information in a future release,
but even without it a determined programmer can easily use a tree to
discover most of the details, as we demonstrate in our examples.
The fields in a
ParseTree
are:
| addtags |
An
int
that controls whether source and line number information is included
in the tree that is generated the next time the parser is invoked,
which happens whenever the
parse
field is changed.
A value of
1
indicates tagging is to be performed
and a value of
0
indicates otherwise.
| | child(int index) |
A
Builtin
that returns a new tree that represents the child of the current node at
index,
or
NULL
when
index
is out of bounds.
The actual structure of parse tree nodes (i.e., the arrangement of
the children) is parser dependent and has not been documented yet,
but simple experiments using the
parse
and
tree
fields can reveal many details.
For example, it is easy to discover that a Yoix
while
statement has an expression and statement as its only two children
and it is not hard to figure out where they come from.
| | depth |
A read-only
int
that measures the distance (i.e., the number of levels) between the current node
and the root of the tree.
| | errordict |
A
Dictionary
used to record the syntax errors that were encountered when
the parser tried to translate the
parse
field.
The value represents the status of the last translation, with
NULL
meaning there were no errors.
| | length |
A read-only
int
that is the number of children of the current node.
Nodes that represent terminals always have zero length,
while nodes that represent special control information
(end-of-file is currently the only example) have their
length
field set to
-1.
| | parent |
A read-only
ParseTree
that represents the node immediately above the current node, which is
NULL
when the current node is the root.
| | parse |
A
String
of Yoix statements or XML documents that are translated by the requested
parser
and stored in the tree.
Changing this field immediately hands the new string to the
parser
and stores the results of the translation in the tree.
| | parser |
An
int
that selects the parser used to translate the
parse
field.
The value should be
PARSER_YOIX,
PARSER_XML,
or
PARSER_DTD,
which are all defined in
yoix.parser.
Parser dependent constants are also available in the
yoix.parser.YoixConstants
and
yoix.parser.XMLConstants
dictionaries.
Note that the DTD parser is just a subset of the XML parser.
| | position |
A read-only
int
that represents the position of the current node relative to all the
children of its
parent
node.
| | tree |
A read-only
String
that is a readable dump of the tree generated when the
parse
field was successfully translated, or
NULL
if there was an error.
| | type |
A read-only
int
that identifies the type of the current node, which will be one of the
parser dependent constants defined in the
yoix.parser.YoixConstants
or
yoix.parser.XMLConstants
dictionaries.
In addition, numeric values can be translated into strings using the
tokenImage
built-in.
| | value |
A read-only
Object
that represents the value associated with the current node, which is always
NULL
for non-terminal and control nodes.
| | walk(ParseTree node [, Object type [, int mode]]) |
A
Builtin
that finds the next node in the current node's parse tree,
stores a representation of that node in
node,
and returns
1
when there are more nodes to visit and
0
otherwise.
The next node that
walk
visits is controlled by the
type
and
mode
arguments.
The
type,
when supplied, should be an
int,
String,
or
Array
that restricts the type of the returned node.
An array can be used to select a set of acceptable node types by either their
int
or
String
representations.
By default, no type restrictions are enforced, which is also what happens
when a
NULL
value is supplied as the
type.
The optional
mode
argument should be
CHILDREN,
DESCENDANTS,
or
ROOT_TREE,
which are all defined in
yoix.parser.
In
CHILDREN
mode, only the current node and its children are candidates for being returned.
In
DESCENDANTS
mode, only the current node and its descendants are candidates for being returned.
In
ROOT_TREE
mode, which is the default, the tree root and all of its descendants are returned
(i.e., the entire tree structure starting from the root).
|
Several permanent fields have not been documented and should not be
used in Yoix applications.
Yoix programs can compare tree object using relational operators.
Two tree objects are equal when they reference the same node in the
same tree and a tree object is considered less than another tree
object when the latter is in the chain of
parent
field values of the former.
Although our XML parser it is not fully validating and does not
exhaustively seek to include all external references,
it does otherwise conform exactly with the
W3C Recommendation of 6 October 2000 entitled
Extensible Markup Language (XML) 1.0 (Second Edition)
available at http://www.w3.org/TR/2000/REC-xml-20001006
except for two intentional deviations:
keywords are case-insensitive and attribute values need not be quoted
when they consist solely of members of character class:
[A-Za-z0-9_.:,-].
Naturally, the DTD, if present, is only placed into the tree and not
applied to the subsequent document.
Application of the DTD is intended to occur at some later point
by working with the generated parse tree.
| |
| Example: |
The program,
import yoix.stdio.*;
ParseTree t;
t.parse = "x = y + 3;";
puts(t.tree);
prints something like,
<STATEMENT <#2>>
<EXPRESSION <#1>>
<EXPRESSION <#5>>
<LVALUE <#1>>
<NAME x>
<LVALUE <#1>>
<NAME y>
<NUMBER 3>
<+>
<=>
<EOF>
on standard output, which is a dump of the tree that the Yoix parser
produce for our simple example.
The children of a node are all indented by the same amount
(one space more than their parent) and the number of children
(i.e., the
length)
attached to a non-terminal node is the number that follows the
#.
In the dump the
type
of each node has been translated from a number to a string
(e.g.,
EXPRESSION
or
NUMBER)
using mappings from the
yoix.parser.YoixConstants
dictionary.
The next program,
import yoix.parser.*;
import yoix.stdio.*;
Statement(ParseTree node) {
Object child;
int n;
for (n = 0; n < node.length; n++) {
child = node.child(n);
switch (child.type) {
case YoixConstants.EXPRESSION:
Expression(node.child(n));
break;
case YoixConstants.STATEMENT:
Statement(node.child(n));
break;
case YoixConstants.EOF:
break;
default:
printf("Statement: %s\n", tokenImage(child.type));
break;
}
}
}
Expression(ParseTree node) {
Object child;
int n;
for (n = 0; n < node.length; n++) {
child = node.child(n);
switch (child.type) {
case YoixConstants.EXPRESSION:
Expression(child);
break;
case YoixConstants.ASSIGN:
printf("Expression: handle assignment\n");
break;
case YoixConstants.PLUS:
printf("Expression: handle addition\n");
break;
default:
printf("Expression: %s\n", tokenImage(child.type));
break;
}
}
}
Statement(new ParseTree {String parse = "x = y + 3;";});
although not completely correct, shows two trivial functions that use the
fields in a tree to examine nodes in the tree that we dumped in the first
example.
Run it and you get something like,
Expression: LVALUE
Expression: LVALUE
Expression: NUMBER
Expression: handle addition
Expression: handle assignment
which is not particularly useful information, but we think it does start
to illustrate some of the things that can be done using trees.
The last program,
import yoix.parser.*;
import yoix.stdio.*;
Indent(int depth) {
String ind = "";
while (depth-- > 0)
ind += " ";
return(ind);
}
ParseTree tree, node;
String buf, doc, value;
while (buf = stdin.nextbuf)
doc += buf;
tree.parser = PARSER_XML;
tree.parse = doc;
if (tree.errordict) {
fprintf(stderr, "ERROR: %s: %s\n",
argv[0], tree.errordict.message);
exit(1);
}
// essentially the same output as puts(tree.tree);
while (tree.walk(node)) {
if ((value = node.value) != NULL) {
printf("%s<%s %O>\n", Indent(node.depth),
tokenImage(node.type, tree.parser), value);
} else {
printf("%s<%s <#%d>>\n", Indent(node.depth),
tokenImage(node.type, tree.parser), node.length);
}
}
uses
walk
to traverse a tree built by the Yoix XML parser.
Type something like,
<p>This is content in the <font
size="+1"><b>XHTML</b></font> namespace.</p>
on standard input and you get
<FOLDER <#1>>
<XML <#1>>
<BODY <#1>>
<BLOCK <#2>>
<NAME p>
<CONTENT <#3>>
<CHARDATA This is content in the >
<BLOCK <#3>>
<NAME font>
<ATTRIBUTE <#2>>
<NAME size>
<VALUE +1>
<CONTENT <#1>>
<BLOCK <#2>>
<NAME b>
<CONTENT <#1>>
<CHARDATA XHTML>
<CHARDATA namespace.>
on standard output.
| | |
| See Also: |
tokenAssociativity,
tokenImage,
tokenPrecedence,
tokenValue
|
|