EXI Prototype for C/C++

Achieve unprecedented compactness for your XML data using OSS Nokalva's EXI Tools for C/C++. The OSS EXI Tools for C/C++ are a set of tools that implement the Efficient XML Interchange (EXI) Format 1.0 (Second Edition) W3C Recommendation.
The OSS EXI Tools for C/C++ enable C/C++ applications to read and write data streams conforming to the Efficient XML Interchange W3C Recommendation.
The OSS EXI Tools for C/C++ have been tested with the W3C EXI interoperability test framework.

EXI is a very compact representation of XML specified in the W3C Recommendation Efficient XML Interchange (EXI) Format 1.0 (Second Edition). EXI improves serialization and parsing speed and allows more efficient use of memory and battery life, compared to standard (textual) XML. An EXI stream is typically many times smaller than an equivalent XML document and requires less CPU time to be read or written.
There are two main ways in which EXI can encode an XML document — the schemaless mode and the schema-informed mode. In the schemaless mode, EXI can encode any XML document whether or not a schema is available to the encoder. In the schema-informed mode, EXI has the unique ability to utilize information extracted from an XML schema to increase the efficiency of the encodings without requiring, in general, strict adherence of the data to the schema. However, the EXI encodings can be even more efficient if the user is sure that the data will be valid according to the schema.
The use of schema information makes the EXI encodings more efficient because it allows the EXI processor, at any point within the EXI stream, to make certain predictions about the next item in the stream. For example, if the schema specifies that an element "A" (in a certain context) must always be followed by an element "B", then an occurrence of element "B" when the previous element was "A" gets encoded in zero bits (in the strict mode).
In the schemaless mode, during an encoding or decoding operation the EXI processor continually modifies the way to encode each item based on the actual content of the document encountered so far. For example, when the EXI encoder encounters an element "C" in the content of an element "P", it assumes that an element named "C" has a higher probability of occurrence than elements with other names when the current parent is an element named "P", and creates an abbreviated way to encode the occurrence of an element named "C" under an element named "P". The next time an element named "C" is encountered under an element named "P" (either the same or a subsequent element with the same name), the EXI encoder will be able to use the abbreviated encoding for "C" and thus save space.
In summary, a user of EXI can choose between three main options: (a) not using a schema at all (schemaless), (b) using a schema in a manner that only supports valid XML document (schema-informed, strict), and (c) using a schema in a manner that supports deviations from the schema (schema-informed, non-strict). The schema-informed, strict mode is the most efficient of the three. The schemaless mode is the easiest to use because it doesn't involve a schema.
EXI, like many other XML compression technologies, uses string tables to temporarily store certain kinds of strings that occur in the XML document being encoded, such as namespace URIs, local names, attribute values, and so on, to allow subsequent occurrences of the same string to be encoded using a short string identifier. In the schemaless mode, all the string tables are reset at the beginning of an encoding or decoding operation. In the schema-informed mode, the string tables containing namespace URIs and local names are prepopulated with strings taken from the schema or defined in the XML Schema Recommendation, so that those strings will be already known at the beginning of each encoding or decoding operation.
There are other options in EXI that affect the content of an EXI stream. Some of those options, called fidelity options, control the EXI processor's ability to include certain types of items in the EXI stream, such as XML comments, processing instructions, and namespace declarations. If the user is not interested in one of such items being preserved in the EXI encoding, they can select an option that will make the EXI encoding more efficient by not having to include that type of items. So, for example, if the user states that namespace declarations and prefixes don't need to be preserved, the EXI stream encoder will give up its ability to encode these things and the resulting EXI stream may be more compact. There is another fidelity option, which controls the preservation of the original string values of attributes and elements with simple types. When this option is not selected, those values are encoded more efficiently (for example, an attribute value of type xsd:integer will be encoded as a binary integer rather than as a string), but it will be impossible for a reader to reconstruct the exact original strings when reading back the EXI stream. In many applications, such loss of information is acceptable, and therefore this option should not be selected.
The last major feature of EXI is the support for byte alignment and compression. The user can choose one of four alignment options: (a) the bit-packed alignment, (b) the byte-aligned alignment, (c) precompression, and (d) compression. Bit-packed and compression are the more compact ones (compression is usually, but not always, more compact than bit-packed). Bit-packed and byte-aligned are the faster ones (byte-aligned may be slightly faster than bit-packed). Both precompression and compression arrange the encoded data within the EXI stream into a particular layout, where all the encoded data items that are likely to be similar are close together. This arrangement increases the effectiveness of a compression algorithm applied to the data. Precompression does not perform any compression per se, as its only purpose is to prepare the EXI stream for an external compression step (outside the EXI processor) to be applied to the EXI stream. Compression goes further and applies the standard DEFLATE algorithm to each chunk of similar encoded data items, to produce the final EXI stream.


Components
The OSS EXI Tools for C/C++ comprise two major components:
The schema preprocessor utility is a command-line application that reads an XML schema and produces a serialized schema file. The schema passed as input to the schema preprocessor must consist of one or more schema document files conforming to W3C XML Schema 1.0. The output is an XML file (the serialized schema file) conveying information extracted from the schema and represented in a proprietary format, which is understood by the EXI/C runtime library. The schemaless mode of EXI does not require the use of the schema preprocessor.
The schema preprocessor relies on the XML schema parsing capabilities and on the XML Schema Object Model of the .NET framework (version 4.5 or later). The .NET framework is required only by the schema preprocessor. It is not required or used by the EXI/C runtime library (see below) and therefore it is not required on the target system.
The EXI/C runtime library is a native Windows DLL that is used for reading and writing EXI streams as well as for converting XML documents and fragments to EXI and EXI streams to XML.
EXI Prototype for C/C++ - API
The API of the EXI/C runtime library is a C-style API and can be used both by C applications and by C++ applications.
The EXI/C runtime library supports two different approaches for creating or processing an EXI stream. In the first approach, the user application reads or writes the EXI document or fragment one node at a time, by calling one or more API functions for each node. In the second approach, the user application makes a single call to an API function that converts an entire document or fragment from XML to EXI or from EXI to XML. For an application that only needs to read or write EXI streams (with no XML involved), the first approach is faster. The second approach may be more convenient when a developer must add EXI to an existing application that reads or writes XML.
The EXI/C runtime library also includes some advanced features that aim to increase EXI encoding/decoding performance. One of them is the ability to pass binary values instead of strings across the API (in both directions) for the values of non-string XSD datatypes (e.g., xsd:integer, xsd:float,xsd:base64Binary, etc.) as well as for the values of enumerated datatypes.
The EXI/C runtime library can also be used for reading and writing XML 1.0 documents and fragments. This capability is used internally by the EXI/C runtime library during the conversion of an entire document or fragment from XML to EXI or from EXI to XML, but is also available to the user application through the API. Most of the API functions that read and write the nodes of a document or fragment are common to XML and EXI.
The stream reading capabilities (for both EXI and XML) of the EXI/C runtime library follow a "pull parser" model.
A SAX2-like API for C++, similar to the SAX2 API of Apache Xerces-C++ (SAX2XMLReader and handler classes), is also included for convenience. The use of this API is optional.
EXI Prototype for C/C++ - Features
Conformance and Main Features (EXI)
The OSS EXI Tools for C/C++ are a complete implementation of the Efficient XML Interchange (EXI) Format 1.0 (Second Edition) W3C Recommendation. As such, they fully support the following features of EXI:
as well as all the other features required for conformance to the EXI Recommendation, both when encoding and when decoding an EXI stream.
Conformance and Main Features (XML)
The OSS EXI Tools for C/C++ are also a conforming implementation of theExtensible Markup Language (XML) 1.0 (Fifth Edition) W3C Recommendation as a non-validating processor, with the following features: