This is a helper object of HTML Parser Light. It provides text encoding/decoding using HTML/XML entities. The entities can be configured, thus even if something is not by default in it you can add it. The default configuration is close to the common HTML practice - i.e. in the HTML encoding only a small number of special characters are encoded using HTML entities (for example ") while all the rest is encoded over the standard rules &<charcode>;
free threaded version
|Encode||str = obj.Encode(string)||HTML Encodes a string|
|Decode||str = obj.Decode(string)||HTML Decodes a string|
|LoadDefaultEntities||obj.LoadDefaultEntities||Loads the default set of entities
recognized/used by the decoder/encoder. They are only:
' - "
& - &
< - <
> - >
- (non-breaking space)
© - ©
® - ®
Any other entity you want to understand/use must be configured using the Entity property.
|codePage||obj.CodePage = x
x = obj.CodePage
|The code page used for conversions such as from UNICODE (the strings you pass to Encode when forceANSI=True) or to UNICODE (the numerically encoded characters in decoded strings if forceANSI = True). See the remark section for more details.|
|entity||obj.entity(charcode) = string
x = obj.entity(charcode)
|Get/put the entity name for a particular
character code. Through this property named representations of any
character can be configured. For instance let do this for the &
obj.entity(38) = "amp"
You set the entity by putting only its name, in the HTML encoded text the entity appears as &<entity_name>; Thus the amp from the sample above will appear as &
Named entities are defined by the HTML/XML standards for special characters only (including some accented letters). Using non-standard entities for encoding should be avoided, but for decoding you can configure any entities in order to cope with the input string even if it violates the standards.
When forceANSI is true the character_code is assumed to be a character code from the specified code page. Otherwise it is assumed to be an UNICODE character code. Still, when decoding if the character_code is > 256 the UNICODE character with that code will be substituted and no error will occur.
|useEntities||obj.useEntities = boolvalue
x = obj.useEntities
|Has effect over the Encode method only. When set to False the configured entities are not used and numerical representation is used instead (for example instead of & & will be generated). (default is False)|
|forceANSI||obj.forceANSI = boolvalue
x = obj.forceANSI
|When set to True it is assumed that the
HTML content is to be encoded to/decoded from a string where the
numerically encoded characters will represent ANSI character codes from
the specified codePage. When it is False it is assumed that the
numerically encoded characters are UNICODE characters. The non-encoded
characters (Decode method) are always converted through the specified
(default it is False)
|useHex||obj.useHex = boolvalue
x = obj.useHex
|Specifies how to encode characters
numerically. If True &#xnnn; form is used if False the &#nnn; form
(default is False)
x = obj.ignoreUnknownEntities
|Specifies how to deal with unknown
entities on Decode. If set to False the Decode method will fail if
unrecognized entity is found, otherwise the entity will be put "as
is" in the output.
(default is True)
|maxEntityLen||obj.maxEntityLen = n
x = obj.maxEntityLen
|The maximum size of a named entity. It is
doubtful that this will need changing in any kind of application.
(default is 32)
|encodeSpecial||obj.encodeSpecial = boolvalue
x = obj.encodeSpecial
|If set to True the Encode method will
encode also the <CR><LF> and space.
(default is False)
The HTML encoding can be a problem sometimes. There are HTML editors and functions that produce incorrect encoding and on some occasions it is possible that the HTML encode/decode facilities at hand are unable to cope with some data. This object can be configured in many ways, thus enabling you to deal with different problems.
Aside of the problem solving this object is needed when working with HTML Parser Light in programming environments which do not offer HTML encoding facilities of their own. For example in ASP/ALP you have Server.HTMLEncode but in VB or NSBasic which are desktop oriented you will not have such a function at hand.