Stretching and squeezing (X)HTML to your needs
Wilbert Kraan, CETIS staff
December 19, 2002

image:Stretching and squeezing (X)HTML to your needs

In it self, what needed to happen to HTML was already clear some years ago; make it adapt to devices and uses other than the graphical browser on a PC that's operated by a sighted, able bodied person. Also, shift the language from something that is layed-out by machines, but carries mainly human-interpretable content to something that is also meaningful to the machines.

The first step in this process was getting the X in; making HTML conform to the conventions of eXtensible Mark-up Language (XML). The result -XHTML- has been with with us a specification for quite some time, even if the uptake of it has been a bit sluggish. The trouble is that going from HTML to XHTML conformance is not trivial for a number of tools and workpractices, but it doesn't actually buy you all that much more. Lots of effort, not a lot of reward.

To address exactly that, the W3C has been working away at moving XHTML from a monolithic, static, take it or leave it language meant for displays into something that is flexible and, crucially, adaptable. How? By defining its elements, attributes and content model not (just) in a single document called a Document Type Declaration (DTD), but in a range of XML Schemas. That job has been nearly done with the publication of the last call working draft of Modularization of XHTML in XML Schema. Next stop will be a new 'recommendation' (final version in W3C speak) sometime next year. The much bigger effort will be XHTML 2.0, which has now reached the second working draft stage. That is: pretty early. And so it should be, because rather than retain the backwards compatibility of the XHTML 1.x versions, this will be a clean break. Out go things like the <3> tag for headings and the <hr> for section dividers, and in come things like <section> and <h>. The idea behind those two is that sections will be contained in them, rather than divided and titled by them.

The modularisation part takes place on two tracks: one track is about spinning off complex areas of webpages like frames and forms into separate specs (called Xforms and Xframes, funnily enough). This has the advantage that other XML dialects can make use of the same conventions. The other track is to split the elements that constitute HTML into clumps of similar functionality called modules. Examples would be things like text, images, style or scripting. This has already been done using DTDs, but is really getting hands and feet with XMLSchema. The big advantage of modularisation is that it is easier to adapt existing content to smaller and simpler devices like PDAs and mobile phones. Rather than try to guess what remains of an all bells and whistles webpage on such a device, it would be possible to mix and match what such a class of devices are capable of from the different specs and modules. Both web content developers and makers of browsers of all kinds would know where they stand. Modularisation also makes it easier for an end user to configure content to their preferences and capabilities.

But chopping poor old HTML to bits and binding it to XML Schema are only the first major steps to make it do genuinely new tasks. By binding it to XML Schema, it becomes possible to both splice ('namespace') XHTML into other XML dialects, or splice other XML dialects into XHTML. Or even, in limited ways, to tell machines to either ignore or re-interpret elements of these other XML dialects. Dialects, that is, like IMS Learning Design, or Simple Sequencing, or the majority of all other learning technology specifications you may have heard off.

What that would allow you to do is already demonstrated rather nicely by some web geeks who were getting fed up with having to maintain both a weblog and a newsfeed. While the RSS newsfeed is a good way of sending the titles and abstracts (and increasing amounts of other stuff) of articles to likeminded sites or dedicated newsreaders, its format is rather different from the webpage it is gleaned off. As can be seen in the newsfeed of the CETIS site, there is no formatting, for example. This can be cured to some extend by using something like Cascading Style Sheets (CSS), but that still won't take care of metadata like author, date of publication, originating site and so on.

So why not splice some XHTML into the RSS- along with the Dublin Core metadata that RSS uses for metadata. The XHTML will be politely ignored by most systems that receive the RSS- unless the newsfeed ends up directly on a webpage, at which point the XHTML will be useful. Likewise, a browser has no idea what all those newsfeed tags mean, even if it is happy to style the text between the tags in the pretty ways described by a CSS sheet. To see what that looks like, point any reasonably modern browser or your favorite newssreader or blog to Mozquito XForms and have a look at the source code.

At this stage, it would be difficult to predict what exactly such XHTML functionality would mean for educational content. The newsfeed example suggests that aggregating content for portals would be a lot easier. Promoting accessibility is another obvious area. Further away could be applications like making it easy to expose learning objects as general webpages to ordinary browsers, but allow specialised browsers to interact with them in much more sophisticated, VLE like ways.

But XHTML, sadly, is not clay, and squeezing and stretching it too far is likely to break it. Also, namespacing tricks simply extend by borrowing, not replacing, other XML dialects. And that's before considering the technical headaches mixing lots of namespaces can cause.