The DOM specification serves as a good example of the power of using XML: all of the HTML documents, Java bindings, OMG IDL bindings, and ECMA Script bindings are generated from a single set of XML source files. This section outlines how this specification is written in XML, and how the various derived works are created.
This specification was written entirely in XML, using a DTD based heavily on the DTD used by the XML Working Group for the XML specification. The major difference between the DTD used by the XML Working Group, and the DTD used for this specification is the addition of a DTD module for interface specifications.
The DTD module for interfaces specifications is a very loose translation of the Extended Backus-Naur Form (EBNF) specification of the OMG IDL syntax into XML DTD syntax. In addition to the translation, the ability to describe the interfaces was added, thereby creating a limited form of literate programming for interface definitions.
While the DTD module is sufficient for the purposes of the DOM
WG, it is very loosely typed, meaning that there are very few
constraints placed on the type specifications (the type
information is effectively treated as an opaque string). In a DTD
for object
to object communication, some stricter enforcement of data types
would probably be beneficial.
The DOM specification is written using XML. All documents are valid XML. In order to produce the HTML versions of the specification, the object indexes, the Java source code, and the OMG IDL and ECMA Script definitions, the XML specification is converted.
The tool currently used for conversion is COST by
Joe English. COST
takes the ESIS output of
nsgmls
, creates an internal representation, and
then allows scripts, and event
handlers to be run over the internal data structure.
Event handlers allow document patterns and
associated processing to be specified: when the pattern is
matched during a pre-order traversal of a document subtree, the
associated action is executed. This is the heart of the
conversion process. Scripts are used to tie the various
components together. For example, each of the major derived data
sources (Java code etc.) is created by the execution of a
script, which in turn executes one or more event handlers. The
scripts and event handlers are specified using TCL.
The current version of COST
has been somewhat
modified from the publicly available version. In particular,
it now runs correctly under 32-bit Windows, uses TCL 8.0, and
correctly handles the case sensitivity of
XML (though it probably could not correctly handle native language
markup).
We could also have used Jade
, by James Clark. Like
COST
, Jade
allows patterns and actions
to be specified, but Jade
is based on DSSSL, an
international standard, whereas COST
is
not. Jade
is more powerful than COST
in many ways, but prior experience of the editor with Cost made it
easier to use this rather than Jade
. A future version
or Level of
the DOM specification may be produced using Jade
or
an XSL
processor.
The complete XML source files are available at: http://www.w3.org/TR/1998/PR-DOM-Level-1-19980818/xml-source.zip
As stated earlier, all object definitions are specified in XML. The Java bindings, OMG IDL bindings, and ECMA Script bindings are all generated automatically from the XML source code.
This is possible because the information specified in XML is a superset of what these other syntax need. This is a general observation, and the same kind of technique can be applied to many other areas: given rich structure, rich processing and conversion are possible. For Java and OMG IDL, it is basically just a matter of renaming syntactic keywords; for ECMA Script, the process is somewhat more involved.
A typical object definition in XML looks something like this:
<interface name="foo"> <descr><p>Description goes here...</p></descr> <method name="bar"> <descr><p>Description goes here...</p></descr> <parameters> <param name="baz" type="wstring" attr="in"> <descr><p>Description goes here...</p></descr> </param> </parameters> <returns type="void"> <descr><p>Description goes here...</p></descr> </returns> <raises> <!-- Throws no exceptions --> </raises> </method> </interface>
As can easily be seen, this is quite verbose, but not unlike
OMG IDL. In fact, when the specification was originally
converted to use XML, the OMG IDL definitions were automatically
converted into the corresponding XML source using common Unix text
manipulation tools.