AT&T Home | AT&T Labs | Research
AT&T Labs, Inc. - Research

The Yoix® Scripting Language

Home | What's New | Grammar | Documentation | Download | License | YChart | YDAT | YWAIT | Byzgraf | FAQs
ParseTree typedict
 
A ParseTree is a sophisticated data type that provides an interface to the Yoix, XML and DTD parsers that were all built by the Yoix team using the JavaCC parser generator. Some of what you need to know to made good use of trees is documented here, but an important piece that is currently missing is documentation that describes how the different parsers build trees. We hope to provide the information in a future release, but even without it a determined programmer can easily use a tree to discover most of the details, as we demonstrate in our examples. The fields in a ParseTree are:
addtags An int that controls whether source and line number information is included in the tree that is generated the next time the parser is invoked, which happens whenever the parse field is changed. A value of 1 indicates tagging is to be performed and a value of 0 indicates otherwise.
child(int index) A Builtin that returns a new tree that represents the child of the current node at index, or NULL when index is out of bounds. The actual structure of parse tree nodes (i.e., the arrangement of the children) is parser dependent and has not been documented yet, but simple experiments using the parse and tree fields can reveal many details. For example, it is easy to discover that a Yoix while statement has an expression and statement as its only two children and it is not hard to figure out where they come from.
depth A read-only int that measures the distance (i.e., the number of levels) between the current node and the root of the tree.
errordict A Dictionary used to record the syntax errors that were encountered when the parser tried to translate the parse field. The value represents the status of the last translation, with NULL meaning there were no errors.
length A read-only int that is the number of children of the current node. Nodes that represent terminals always have zero length, while nodes that represent special control information (end-of-file is currently the only example) have their length field set to -1.
parent A read-only ParseTree that represents the node immediately above the current node, which is NULL when the current node is the root.
parse A String of Yoix statements or XML documents that are translated by the requested parser and stored in the tree. Changing this field immediately hands the new string to the parser and stores the results of the translation in the tree.
parser An int that selects the parser used to translate the parse field. The value should be PARSER_YOIX, PARSER_XML, or PARSER_DTD, which are all defined in yoix.parser. Parser dependent constants are also available in the yoix.parser.YoixConstants and yoix.parser.XMLConstants dictionaries. Note that the DTD parser is just a subset of the XML parser.
position A read-only int that represents the position of the current node relative to all the children of its parent node.
tree A read-only String that is a readable dump of the tree generated when the parse field was successfully translated, or NULL if there was an error.
type A read-only int that identifies the type of the current node, which will be one of the parser dependent constants defined in the yoix.parser.YoixConstants or yoix.parser.XMLConstants dictionaries. In addition, numeric values can be translated into strings using the tokenImage built-in.
value A read-only Object that represents the value associated with the current node, which is always NULL for non-terminal and control nodes.
walk(ParseTree node [, Object type [, int mode]]) A Builtin that finds the next node in the current node's parse tree, stores a representation of that node in node, and returns 1 when there are more nodes to visit and 0 otherwise. The next node that walk visits is controlled by the type and mode arguments. The type, when supplied, should be an int, String, or Array that restricts the type of the returned node. An array can be used to select a set of acceptable node types by either their int or String representations. By default, no type restrictions are enforced, which is also what happens when a NULL value is supplied as the type.

The optional mode argument should be CHILDREN, DESCENDANTS, or ROOT_TREE, which are all defined in yoix.parser. In CHILDREN mode, only the current node and its children are candidates for being returned. In DESCENDANTS mode, only the current node and its descendants are candidates for being returned. In ROOT_TREE mode, which is the default, the tree root and all of its descendants are returned (i.e., the entire tree structure starting from the root).

Several permanent fields have not been documented and should not be used in Yoix applications. Yoix programs can compare tree object using relational operators. Two tree objects are equal when they reference the same node in the same tree and a tree object is considered less than another tree object when the latter is in the chain of parent field values of the former.

Although our XML parser it is not fully validating and does not exhaustively seek to include all external references, it does otherwise conform exactly with the W3C Recommendation of 6 October 2000 entitled Extensible Markup Language (XML) 1.0 (Second Edition) available at http://www.w3.org/TR/2000/REC-xml-20001006 except for two intentional deviations: keywords are case-insensitive and attribute values need not be quoted when they consist solely of members of character class: [A-Za-z0-9_.:,-]. Naturally, the DTD, if present, is only placed into the tree and not applied to the subsequent document. Application of the DTD is intended to occur at some later point by working with the generated parse tree.
 
 Example:   The program,
import yoix.stdio.*;

ParseTree t;
t.parse = "x = y + 3;";
puts(t.tree);
prints something like,
<STATEMENT <#2>>
 <EXPRESSION <#1>>
  <EXPRESSION <#5>>
   <LVALUE <#1>>
    <NAME x>
   <LVALUE <#1>>
    <NAME y>
   <NUMBER 3>
   <+>
   <=>
 <EOF>
on standard output, which is a dump of the tree that the Yoix parser produce for our simple example. The children of a node are all indented by the same amount (one space more than their parent) and the number of children (i.e., the length) attached to a non-terminal node is the number that follows the #. In the dump the type of each node has been translated from a number to a string (e.g., EXPRESSION or NUMBER) using mappings from the yoix.parser.YoixConstants dictionary.

The next program,

import yoix.parser.*;
import yoix.stdio.*;

Statement(ParseTree node) {
    Object child;
    int    n;

    for (n = 0; n < node.length; n++) {
	child = node.child(n);
	switch (child.type) {
	    case YoixConstants.EXPRESSION:
		Expression(node.child(n));
		break;

	    case YoixConstants.STATEMENT:
		Statement(node.child(n));
		break;

	    case YoixConstants.EOF:
		break;

	    default:
		printf("Statement: %s\n", tokenImage(child.type));
		break;
	}
    }
}

Expression(ParseTree node) {
    Object child;
    int    n;

    for (n = 0; n < node.length; n++) {
	child = node.child(n);
	switch (child.type) {
	    case YoixConstants.EXPRESSION:
		Expression(child);
		break;

	    case YoixConstants.ASSIGN:
		printf("Expression: handle assignment\n");
		break;

	    case YoixConstants.PLUS:
		printf("Expression: handle addition\n");
		break;

	    default:
		printf("Expression: %s\n", tokenImage(child.type));
		break;
	}
    }
}

Statement(new ParseTree {String parse = "x = y + 3;";});
although not completely correct, shows two trivial functions that use the fields in a tree to examine nodes in the tree that we dumped in the first example. Run it and you get something like,
Expression: LVALUE
Expression: LVALUE
Expression: NUMBER
Expression: handle addition
Expression: handle assignment
which is not particularly useful information, but we think it does start to illustrate some of the things that can be done using trees.

The last program,

import yoix.parser.*;
import yoix.stdio.*;

Indent(int depth) {
    String ind = "";

    while (depth-- > 0)
        ind += " ";
    return(ind);
}

ParseTree tree, node;
String buf, doc, value;

while (buf = stdin.nextbuf)
    doc += buf;

tree.parser = PARSER_XML;
tree.parse = doc;

if (tree.errordict) {
    fprintf(stderr, "ERROR: %s: %s\n",
        argv[0], tree.errordict.message);
    exit(1);
}

// essentially the same output as puts(tree.tree);
while (tree.walk(node)) {
    if ((value = node.value) != NULL) {
        printf("%s<%s %O>\n", Indent(node.depth),
            tokenImage(node.type, tree.parser), value);
    } else {
        printf("%s<%s <#%d>>\n", Indent(node.depth),
            tokenImage(node.type, tree.parser), node.length);
    }
}
uses walk to traverse a tree built by the Yoix XML parser. Type something like,
<p>This is content in the <font
size="+1"><b>XHTML</b></font> namespace.</p>
on standard input and you get
<FOLDER <#1>>
 <XML <#1>>
  <BODY <#1>>
   <BLOCK <#2>>
    <NAME p>
    <CONTENT <#3>>
     <CHARDATA This is content in the >
     <BLOCK <#3>>
      <NAME font>
      <ATTRIBUTE <#2>>
       <NAME size>
       <VALUE +1>
      <CONTENT <#1>>
       <BLOCK <#2>>
        <NAME b>
        <CONTENT <#1>>
         <CHARDATA XHTML>
     <CHARDATA  namespace.>
on standard output.
 
 See Also:   tokenAssociativity, tokenImage, tokenPrecedence, tokenValue

 

Yoix is a registered trademark of AT&T Inc.