ParseString method

ASPCompiler

ParseString method

ASP Compiler 1.1 documentation is under development. See also the examples.

ParseString method is simple - it invokes the parser over the string passed as parameter and returns the resulting object model.

Syntax

Set model = parserobject.ParseString( HTML_content_string )

Parameters

HTML_content_string - is a HTML page or other textual content resembling the HTML syntax and general rules.

The resulting model is a VarDictionary collection that contains sub-collections which represent the found elements. They could contain another sub-collections in turn (if the corresponding tag contains another one). The result depends on the parser configuration performed before the call to the ParseString - see AddTag.

More about the object model

The parser finds only the elements defined through prior calls to the AddTag method. The other content is treated as plain text no matter if it contains another HTML tags or not. This is convenient when using a HTML file as a template because the script does not need to deal with the other content. Thus the script defines the HTML tags that are meaningful for its work by a few calls to the AddTag method and then calls the ParseString method to obtain the object structure of the file. Even a tricky technique like treating "incorrectly" some HTML elements could help to simplify the returned structure. For exampl if you are interested only in some attributes of the <BODY> you may tell the parser that it has no closing tag (</BODY>). This will not cause problems when generating the document again, because the </BODY> tag will be generated from the "plain text" corresponding to the non-parsed parts of the document. But benefits could be considerable if you need to simplify the document tree. However it is recommended to define the element according to their usual HTML role in order to keep the source code easier to understand (but this is the only reason - if the benefits are considerable feel free to treat some HTML elements in the most convenient way).

How the found tags are represented?

The returned collection is the document root. Every tag found will be represented as item in it. This item will be VarDictionary collection too. Every HTML attribute specified for this tag will appear in the collection as a named item - the name is the attribute name and its value is the attribute value.

If the tag contains other tags (for example tag containing several <A> tags) they will be represented as items in the same collection too. To distinguish between the sub-tags and the HTML attributes the script may use simple IsObject VBScript function (or typeof in JScript). - the attributes are string values and the sub-tags are sub-collections - VarDictionary objects. Thus every sub-tag is an object and the attributes are non-object values - string values.

The values with names beginning with "__" (two underscores) are special values describing the tag itself. HTML does not define any attributes like these thus they will not interfere with the regular HTML attributes. There is one important special value for every tag: "__class" - its value contains the TAG name (e.g. "P" for tag or "TITLE" for the <TITLE> tag for example). The ASP page that inspects the model returned by the parser use this item to determine what kind of tag is represented by the current "node". In most cases to find a particular tag the script does not need to cycle through the entire tree recursively. Using the VarDictionary's methods FindByValue and FindByName it is able to request from the current node a sub-node (in any depth) by passing a simple criteria.

Example. Let's use this text for the example

<HTML><BODY>
A Paragraph
</BODY></HTML>

Assume it is already in the src variable:

Set parser = Server.CreateObject("newObjects.utilctls.TextEmbedParser")
parser.AddTag "P",True,False,False,False,"",""
Set model = parser.ParseString(src)

We requested only the elements (See AddTag for details on its parameters). We specified that it has a closing tag (e.g. ) and that it may not contain another tags as sub-elements.

The resulting collection (model) will have the following items:

model(1) - VarDictionary collection
model(1)("__class") = "text/plain"

model(2) - VarDictionary collection
model(2)("__class") = "P"

model(3) - VarDictionary collection
model(3)("__class") = "text/plain"

FindByName method could be used very effectively together with the ID HTML attribute. By default the parser will see if the given tag contains ID attribute and if the attribute presents it will name the node (the VarDictionary collection representing the tag in the resulting tree) with the value of this attribute. This feature follows the usual way the HTML elements are named thus you could use one naming convention for server side processing and client side DHTML scripts - e.g. some tags could be accessed by the same ID at the server side (through the parser) and on the client side by a script in the page. Therefore in the above example model.Key(2) will return "A1".

What is the representation of the non-parsed content?

There is one node that has no HTML equivalent. Its __class is "text/plain". Actually it may contain HTML attributes the ASP page will not need to access but they are treated as clear text by the parser and if the page had not used the AddTag to request them then it is not interested in them too. Thus the content of a HTML element that has open and close HTML tags will appear as sub-node of the the element's node in the tree and its __class will be "text/plain".

Modifying the tree the script could change the role of a particular node by changing its __class. For example the script may want to find a <DIV> element (by ID for example), replace its content and then strip the enclosing DIV tags in the regenerated document. Then the script could just do this: node("__class") = "text/plain". Passing the tree to the GenerateDoc method will place only the content of the node in the generated document.

Where is the content of the text/plain nodes? They have an item named __content - its value (string) is the content. Pay attention for the specific characteristics of the element when changing its __class to text/plain. If it has some sub-nodes changing its __class to text/plain will instruct the parser to skip them during the regeneration of the document! Therefore changing the __class to test/plain is very effective when a self-closing HTML element is replaced with some text (for example tag you define for your usage only - something like the CUSTOMFIELD tag used by the ASP Compiler), but it is rarely convenient to use this technique if the element being modified is expected to contain sub-tags.