The semantic web: How RDF will change learning technology standards
Mikael Nilsson, Center for User-Oriented IT-design, Royal Institute of Technology, Stockholm
September 26, 2001

The field of learning technology has entered a phase of intense work on standardization of learning technology descriptions of various kinds.

Most of the work so far has focused on XML as the encoding language for such specifications (e.g. IMS, IEEE-LOM, and SCORM). However, the World Wide Web Consortium (W3C) is putting their energy into another model for computerized descriptions, called Resource Description Framework, RDF, which is the foundation for the Semantic Web vision of Tim Berners-Lee1.

This raises important question regarding the future of learning technologies: In what way might RDF be useful for learning technology specifications? In what sense does RDF represent the future of meta-data, and how does this affect learning technology?

It turns out that the answers to these questions requires revising some of the fundamental assumptions in the learning technology field.

RDF origins: The Semantic Web vision

The Semantic Web is the name of a long-term project recently started by W3C with the stated purpose of realizing the idea of having data on the Web defined and linked in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.2

It was motivated by the very same problems that motivates the development
of meta-data standards3 - the fact that raw media, in the form of text, HTML, images or video streams, contains meta-information that may be readily deducible from the context for the human consumer (the name of the author, the kind of material contained within, etc.), but is mostly inaccessible to computers.

Making this information available to computers in order to enhance their usefulness, was the driving vision that created the Semantic Web project.

Most traditional meta-data approaches take the view of meta-data as being mostly a digital indexing scheme to use in cataloging and digital libraries. What distinguishes the Semantic Web from these approaches to meta-data are two important things:

The Semantic Web is a layered structure. XML forms the basis, being the transport syntax. RDF provides the information representation framework, and on top of this layer, schemas and ontologies provide the logical apparatus necessary for the expression of vocabularies, enabling intelligent processing of information.

Essential RDF

While the current Web allows you to link to anything from anything in a machine-understandable way, the Semantic Web will allow you to say anything about anything in a machine-readable way.

Seen this way, RDF is the language in which Semantic Web meta-data statements are expressed. In fact, RDF can be said to consist purely of so-called statements.

An RDF statement consists of three elements:Statements are about Web resources, so subjects and objects are URIs, machine-readable identifiers.

Objects can also be plain text strings. Saying

The document "http://www.w3c.org/2001/sw/ was created by W3C"

is represented by the triple:

("http://www.w3c.org/2001/sw/", created by, "W3C")

To disambiguate the different predicates that can be used, every predicate must be given a URI. In this case, there is a standard predicate available in the Dublin Core vocabulary, namely "http://purl.org/dc/elements/creator", which we can use. The triple then becomes

("http://www.w3c.org/2001/sw/", "http://purl.org/dc/elements/creator",
"W3C")


This demonstrates that URIs can be used to name not only concrete digital documents on the web, but abstract entities as well. In order to talk about non-digital resources, we must give them URIs. For example, to talk about the organization W3C (i.e., use it the subject of a statement), we must give it a URI. Let's give it the URI

"http://www.w3c.org/organization".

We can now say things such as ...

"http://www.w3c.org/2001/sw/ was created by http://www.w3c.org/organization, which is an organization with the name 'W3C'"

... which is, in fact, three separate statements.

More complicated RDF expressions like this are usually represented as graphs, where the subjects and objects are nodes, and the predicates are edges5.

This is all there is to basic RDF - nodes-and-arcs diagrams interpreted as statements about concepts or digital resources represented by URIs6. However, the need for standardized vocabularies for things like "organization" and the predicate "is a" is evident. The basis for such vocabularies in RDF is RDF Schema7.

This specification provides the basic vocabulary to express relationships between terms: resources being instances of terms ("http://www.w3c.org/organization is an organization"), terms being subterms of other terms ("a hex-head bolt is a type of machine bolt") and so on.

It also provides means to restrict the usage of predicates: "is a parent of" only applies to persons, etc. The terms instance, subterm, applies to are the kind of terms defined by the RDF Schema specification.

Using the vocabulary provided by RDF Schema, it is easy to create your own semantically rich vocabularies.

Describing resources using RDF

It is not immediately obvious that the simple statement model of RDF can be used to make the Semantic Web a reality. The most fundamental benefit of RDF compared to other meta-data approaches is that using RDF, you can say anything about anything. Anyone can make RDF statements about any identifiable resource. Using RDF, the problems of extending meta-data and combining meta-data of different formats, from different schemas disappear, as RDF does not use closed documents.

Important uses of RDF to encode information for any resource you can name with a URI include:

describe

Since a resource can have uses outside the domain foreseen by the author, any given description (meta-data instance) is bound to be incomplete. Because of the distributed nature of RDF, a description can be expanded, or new descriptions, following new formats (schemas), can be added. This allows for new creative uses of content in unforeseen ways. This is one of the important features of the current Web, where anyone can link to anything, that has been carried over into RDF.

certify

There is no reason why only big organizations should be able to certify content - individuals such as teachers may want to certify a certain content as a quality learning resource that is well suited for specific learning tasks. How to handle this kind of certification will be an important part of the Semantic Web.

annotate

Everything that has an identifier can be annotated. There are already attempts in this direction: Annotea8 is a project where annotations are created locally or on a server in RDF format. The annotations apply to HTML or XML documents and are automatically fetched and incorporated into web pages via a special feature in the experimental browser Amaya9.

extend

Structured content (typically in XML format) will become common. Successive editing can be done via special RDF-schemas allowing private, group consensus or author-specific versions of a common base document. The versioning history will be a tree with known and unknown branches which can be traversed with the help of the next generation versioning tools.

reuse

RDF is application independent. As the meta-data is expressed in a standard format independent of more advanced schemas that are used, even simplistic applications can understand parts of large RDF descriptions. If more advanced processing software is available (such as logic engines), more advanced treatment of the RDF descriptions is possible.



RDF and XML: Model, Syntax, Semantics

So far nothing has been said about XML. The reason is that the RDF Model can be defined completely without reference to XML. XML can, however, be used as a syntax for RDF statements. The RDF specification defines the standard syntax to encode RDF statements in XML.

But one question remains: why can't XML and XML Schema be used to represent the same kind of information that RDF expresses? XML Schemas are, after all, powerful tools to express complex requirements on XML elements. This is true, and XML and XML Schemas can be used to do some of what RDF does, but not without much trouble. The reasons are several:


The difference can be formulated in this way10: XML/XML Schema is a data modeling language, and RDF is a meta-data modeling language. When meta-data needs to be encoded as data, an XML syntax is very useful. However, modeling meta-data in pure XML severely restricts its flexibility.

Lessons from Using RDF in the IMS Specifications

In May 2001, IMS released version 1.2 of its meta-data specification. This major release - marking an important step forward for the specification and the underlying IEEE LOM standard - was informed by lessons learned since IMS first published its meta-data specification. But this release also had a new twist: A development effort, lead by myself, to produce an RDF binding.

The development of this specification was an important testbed for RDF-based meta-data descriptions in learning technologies, as were subsequent efforts at producing an IMS Content Packaging RDF binding.

Some of the important positive lessons we learned in this effort were:


These lessons are in no way coincidences. While XML way designed as a data interchange format, RDF was designed from the ground up to fulfil the role of an Internet architecture for meta-data. "Resource Description Framework (RDF) is a foundation for processing metadata; it provides interoperability between applications that exchange machine-understandable information on the Web"19. This is very clearly reflected in the findings above.

On another note, RDF presents several drawbacks, as has been made clear during the work on the RDF bindings.
This has caused minor interoperability problems between the XML binding and the RDF binding, problems that can only be remedied by designing the information model with the RDF binding in mind.

From another perspective, this is a very positive side effect, as it significantly helps sharpen the information model. This has already been observed in the design of the IMS meta-data RDF binding, but from preliminary studies seems to be even more evident in the work on an RDF binding of IMS Content Packaging.

Possibilities on the Semantic Web

From a more strategic point of view, the emerging Semantic Web presents exciting new possibilities for uses of learning technology specifications. While XML standards are very good tools for enabling interoperability by specifying import and export formats for LMSs20, they tend to favor large, centrally managed, monolithic systems.

By enabling the use of learning technology specifications in Semantic Web technologies, a much wider range of applications are imaginable:


The Semantic Web promises to create a web-based eco-system for learning resources, freeing the material from being trapped in closed systems. One important example of this kind of technology is Edutella22, an RDF-based peer-to-peer system under development, being designed to allow distributed access to learning resource meta-data expressed in many different schemas. By combining meta-data from many sources in a controlled but distributed way, cross-annotation and mutual reuse of material becomes a reality.

In short, the vision of the Semantic Web is an important vision for online learning as well.

Conclusions

As has been pointed out above, the specifications produced by the different parties in the meta-data community (Dublin Core, VCard, LOM, SCORM, etc.) all have complex interdependencies, and increasingly so.

It has become evident that the reuse of vocabulary and technology between those specifications is a very difficult task, and has only been successful in some simple cases.

It should be clear that this is not always due to a fundamental interoperability problem - more often than not, it is a result of the closed nature of XML bindings. RDF bindings, on the other hand, are in general much more intensely interdependent as a consequence of the heavy reuse of vocabularies.

Using RDF in learning technology specifications would also greatly reduce the difficulties involved in adding another specification to an existing software system - it would only involve adding a new RDF Schema, which has been designed to interoperate with the existing schemas. This has important consequences for the adoption of learning technologies.

The future for RDF for learning technology specifications is bright, and the possibilities opened up by RDF and Semantic Web technologies promise to take learning technology project to a new level of applications.

But some effort from the learning technology specification community to produce the necessary specifications will be needed. In particular interoperability discussions between standards groups need to be intense and technical in order to maximize interoperation.

RDF provides an important technological platform to handle the interoperability demands of the emerging specification and vocabulary jungle.

Notes and useful links:

  1. ... Berners-Lee (up) http://www.w3.org/2001/sw/
  2. ... (up) W3C Semantic Web Activity Statement at http://www.w3.org/2001/sw/Activity, part of the W3C Semantic Web site
  3. ... standards (up) The Semantic Web activity is, as a matter of fact, the successor of the W3C Metadata activity. Thus, the Semantic Web is the W3C meta-data architecture.
  4. ... (up) Taken from Tim Berners-Lee, James Hendler, Ora Lassila: The Semantic Web published in Scientific American, May 2001.
  5. ... edges (up) For a more detailed explanation of these concepts, see e.g. An Introduction of RDF by Eric Miller or the RDF syntax and model specification.
  6. ... URIs (up) RDF also contains an important mechanism called reification, that allows you to state something about another RDF statement, such as who said it, whether it is true or false, etc.
  7. ... Schema (up) Defined in http://www.w3.org/TR/rdf-schema
  8. ... Annotea (up) See the Annotea website
  9. ... Amaya (up) See the Amaya website
  10. ... way (up) See also http://www.w3.org/DesignIssues/RDF-XML.html or in more detail: www.ontoknowledge.org/oil/downl/IEEE00.pdf
  11. ... Edutella (up) See the Edutella website.
  12. ... meta-data (up) See the Dublin Core website
  13. ... DAML (up) See the DAML website
  14. ... OIL (up) See the OIL pages at Ontoknowledge.org
  15. ...ontologies (up) See e,g, http://www.ontology.org/main/papers/faq.html for a definition of the word ''ontology'' in this context. The paper Combining Ontologies and Terminologies in Information Systems by Johann Gamper, Wolfgang Nejdl and Martin Wolpers is also of interest.
  16. ... needed (up) For an important example of this, see the UNIVERSAL project (http://nm.wu-wien.ac.at/universal/, and http://www.ist-universal.org/).
  17. ... (up) Via reification, mentioned above.
  18. ... specifications (up) Something similar has already been done in the UNIVERSAL project, where e.g. learner and contributor information is added on top of low-level meta-data.
  19. ... Web'' (up) RDF specification: http://www.w3.org/TR/REC-rdf-syntax
  20. ... LMSs (up) Learning Management Systems - a software system for managing an interactive learning environment.
  21. ... Annotea (up) See the Annotea website
  22. ... Edutella (up) See the Edutella website.