Zeroes and Ones
Scott Wilson, CETIS staff
December 20, 2004

I've had quite a few conversations along the lines of "XML is great but are all these angle brackets really necessary or just a waste of bandwidth?"

No, really! And I'm not the only one either.

For some types of XML documents - HTML, for example - having a human-parseable text encoding is very useful. It also comes in handy for being able to hand-edit certain XML and RDF files such as RSS or FOAF, or to easily use scripting languages to manipulate the text for a variety of purposes.

However, for many of the kinds of XML flowing around today, especially within the enterprise, the only real reason a human being would eyeball XML is for debugging purposes. So why use text at all and not just make a nice compact binary exchange, like we used to Back In The Day?

Well, this has reached some form of critical mass, and W3C has published some use cases for its intiative on XML Binary Characterization. I'm not sure this qualifies as "News" as it happened rather quietly back in November, but it was news to me as I went trawling around the W3C website!

Binary-encoded XML could make a big difference in performance for service-oriented environments, as it both reduces the physical size of messages flying around the network and increases the efficiency of deserializing messages into more compact objects when they are received.

Some other forms of XML that could benefit from binary encoding are those used in security processing, such as SAML and XACML. Security subsystems are subject to heavy loads and are often a bottleneck on transaction performance, so speeding up processing here would be a godsend. For example, XACML uses XML to structure policies that are processed to create authorization decisions against requests. These files are mostly arcane namespaces and huge URNs with very little content that could be considered human-readable, and executing authorization algorithms using them is pretty costly, so this would be a great use-case for binary encoding.

For similar reasons, the XML encoding for LDAP (DSML - Directory Services Markup Language. Usually pronounced "Dismal", rather unfortunately) has generally had a lukewarm response in some quarters, again along the lines of "why are we adding lots of angle brackets to this data?"

This isn't the end of text-encoded XML by any means - for some tasks handcrafting XML in text editors and wrangling with it in Python scripts is here to stay - but hopefully we can clear the logjams in some of our enterprise message queues with the help of this initiative.