Format specifications Binary Universal Data Structures (UDS)
Binary format for Universal Data Structures (UDS) is used to store UDS to/from streams or sometimes memory blocks (in most cases applications prefer treating memory blocks as streams, therefore you can assume binary streams). newObjects [] supports currently 2 C++ libraries and a COM library supporting this format (and others). They are available for Win32. Platforms supported are Windows 95/98/ME, Windows NT 3.51,4.0/2000/XP/2003 and later, Windows CE 3.0 (in all variations - Pocket PC/Smart phone etc.) and later.

This binary format is common for all platforms but supports custom extensions which may allow specific implementations. These extensions should be avoided in all the applications which will need to share data through this format with applications from other vendors. In other words this is universal format understandable for all the applications implementing it which also allows customization for specific purposes (for example applications for private usage). In UDS there are many parallels with XML including the mentioned extensions - in both cases using custom additions will not crash any application but it willnot be able to "understand" the data. However UDS is much more programming oriented than XML and its base format is more suitable for programming.

 

UDS Binary format (UDS-BF)

The UDS-BF is sequence of bytes containing control fields, specific information fields and the data fields. The data must be enclosed in a block marked by start/end entries and only the entire block can be read safely. Although some exceptions can be made from the above requirements they are permitted only for internal usage in the application itself (communication between internal components for example).

The stream consists of entries. Each entry contains unit of control or actual data. Also the field may define border (begin/end) of logical section/record etc.

Each entry begins with two bytes header (entry header):

byte 0 byte 1
bit 0-7  bit 4-7 bit 0-3
entry type number of sub-parts entry flags
  • entry type - a code defining the type of the entry (see below)
  • number of sub-parts - 0 - 15 data fields (sub-parts) can be contained in the entry (see description after this section).
  • entry flags - some entries may need flags which define what they contain/mean. Often certain flag refers to a sub-entry.

Sub-part consists of 

4 bytes data - n bytes
size (n) in bytes of the data actual data
  • size  - is unsigned integer stored in little endian format (see byte order notes in the end of the document)
  • data - data sequence of bytes. In all the implementations data is processed by an encoder component which allows different encodings. However for better compatibility (avoid installing multiply encoders) there is a default encoder which should be used in all public (or potentially public implementations). The default encoder writes any numbers in little endian form.

Entries supported

BSTSFIELD_SKIP entry type: 0x00. These entries can be used in specific implementations they are skipped by all the public implementations.

BSTSFIELD_STREAMBEGIN entry type: 0xFE. Stream begin. Should be the first entry in the saved data - defines the start of the UDS-BF package.

BSTSFIELD_STREAMEND entry type: 0xFF. Stream end. Should be the last entry in the data sequence. After it another UDS-BF block may follow. Together with the BSTSFIELD_STREAMBEGIN these two entries define the bounds of the block containing the data sequence which should be read in turn.

BSTSFIELD_ENCODER entry type: 0xFD. Encoder. Should be immediately after the BSTSFIELD_STREAMBEGIN. Contains the encoder signature and its data (if any).

Flags: 
BSTSFIELD_ENCCUSTOM_SIGNATURE 0x2. (required) the first sub-part contains the encoder signature.

BSTSFIELD_ENCCUSTOM_SETTINGS 0x4. (optional) the second sub-part contains the encoder settings.

Comments: Encoder signature is 4 bytes. Usually they contain ASCII codes (not a requirement) of the encoder signature. The default encoded has a signature "NULL". 
The encoder settings are saved/read by the encoder and may vary.

BSTSFIELD_SECTIONBEGIN entry type: 0x01. Defines a section start. Must pair with BSTSFIELD_SECTIONEND entry. The sections may nest.

Flags:
BSTSFF_HASNAME 0x1. If specified the first sub-part contains the name of the section.

BSTSFIELD_SBCUSTOM_CLASSINFO_NAME 0x2. If specified the sub-part after the name (if name exists or the first sub-part if the section has no name) contains the class name attached to the section. Class name is an optional feature that allows objects to be persisted in sections.

BSTSFIELD_SBCUSTOM_CLASSINFO_ID 0x04. Like previous but 4 byte class ID is specified. Previous and this flag are mutually exclusive. Actually ClassID is not used currently.

Comments: Sections define tree branches. Each section may contain unlimited number of records and nested sections. Class names are often used by applications which need to save "live data" - e.g. the objects and the data together. Jacked-Objects C++ library contains tools allowing the entire application to be saved/restored. However the class names can be used safely in custom manner if desired.

The sections may contain nested sections and records. The representations in memory should follow the following pattern:
The entries in a section should be accessible by name and by index. Nested sections and records must share same index/name space. There should be way to check the type of the element at the referred position. Also when element is removed its place should remain unused to preserve the indices (same behavior as the values in a record - see below).

BSTSFIELD_SECTIONEND entry type: 0x02. Section end. Defines the end of the current section (last opened section - corresponds to the closest non-paired start section entry). No flags or sub-parts are supported.

BSTSFIELD_RECORDBEGIN entry type: 0x03. Record begin. Defines beginning of a record. Must be in section. May contain values (see below).

Flags: 
BSTSFF_HASNAME 0x1. If specified the first sub-part contains the name of the record.

Comments: The records are collections/arrays of values. Usually they are accessed in two forms - as associative arrays (if the values or at least most of them are named), or indexed. The both methods are often combined. The representation in memory should follow this pattern:
When value is removed its place should remain empty. There must be a function/method which gives the application way to determine if the value is used or unused (often referred as null). Then when new value is added it can reuse an unused position. This ensures static indices which allows in turn speed optimizations.

BSTSFIELD_RECORDEND entry type: 0x04. Record end. Specifies end of the record started with the BSTSFIELD_RECORDBEGIN.

Values. Each value entry supports flag BSTSFF_HASNAME 0x1 as optional feature. If present it denotes that the first sub-part is string - the name of the value.

BSTSFIELD_VALUE_RAW entry type: 0x11. Raw value. Sequence of bytes. Except the optional name the entry contains raw data sub-part.

... to be completed ...

newObjects Copyright 2001-2006 newObjects [ ]