Metadata Quality in e-Learning: Garbage In - Garbage Out?
One of the first things I ever learned as a schoolgirl about computers and computer programming was the acronym GIGO: Garbage In, Garbage Out. As a grownup librarian working in e-learning, I was surprised to find a few years back that those who were thinking about metadata for learning objects appeared to have forgotten this truism.
When I speak of quality assurance for metadata, I am not speaking of the quality of the specifications and standards (such as IMS Learning Resource Metadata; Dublin Core Educational Metadata; IEEE Learning Object Metadata), nor of the quality of application profiles (such as CanCore and the UK LOM Core), nor even of the quality of the vocabularies and taxonomies used to describe resources within metadata records, important though all of these are. These developments could be said to deal with the structure of the metadata, where I am concerned with the creation of the content of the metadata fields in describing learning materials. Once a metadata standard has been implemented within a system, the specified fields must be filled out with real data about real resources, and this process brings its own problems.
"For searchers, these problems manifest themselves in various ways, including poor recall of available resources and inconsistency of search results. They arise due to errors, omissions and ambiguities in the metadata, many of which are known and understood in other communities of practice, often having tried and tested solutions." Currier et al (2004).Now, the extent to which metadata quality in this sense is important for the development of learning object repositories and economies, is open to debate, and there is no definitive research yet which will tell us to what extent we need to be aware of quality assurance, the metadata creation workflow, staff training, and so forth. Indeed, these factors will no doubt differ according to whether you have a repository of 100,000 highly granular learning objects, or, like my current project, are looking at a repository of only several hundred high quality learning objects. In general though, it may be safe to say, as Ben Ryan and Steve Walmsley of the HLSI repository project have said:
"If you cannot search for an educational resource because it does not have metadata, or a search returns several hundred or thousand results, you either; cannot re-use the resource because you cannot locate it or decide which resource is relevant to your needs because of the time required to assess the results of the search." (Ryan & Walmsley, 2003)
In the beginning…e-learning was to be a brave new world of direct sharing of resources, almost a peer-to-peer model (and in fact P2P may be the coming thing, adding another dimension to this issue), with authors of materials considered to be the primary creators of metadata about their resources. Stephen Downes encapsulated the lack of thinking in e-learning about this issue in his seminal 2001 paper Learning objects: resources for distance education worldwide:
"Whatever the properties, the authoring of metadata itself will be straightforward for most course designers. Because metadata files are machine-writable, authors will simply access a form into which they enter the appropriate metadata information." Jane Barton and I have contended in three published research papers (Currier & Barton, 2003; Barton, Currier & Hey, 2003; Currier et al, 2004) on this topic that there were some underlying assumptions behind this lack of thinking:
- That, in the context of the culture of the Internet, mediation by controlling authorities is detrimental and undesirable;
- That rigorous metadata creation is too time-consuming and costly, a barrier in an arena where the supposed benefits include savings in time, effort and cost;
- That only authors and/or users of learning materials have the necessary knowledge or expertise to create metadata that will be meaningful to their colleagues;
- That, given a standard metadata structure, metadata content can be generated or resolved by machine.
We have also put forward a fifth underlying reason, garnered from conversations with e-learning colleagues around the world: that for both technology and pedagogy experts, metadata creation is seen as a tedious chore rather than as a complex set of skills, essential for unlocking access to resources.
PracticeIn the UK in 2001, the influential learning object repository project SeSDL realized near the end of their funded life that they needed a librarian’s help to construct a subject taxonomy for their repository. That librarian turned out to be me, and SeSDL became an early case study in why we in e-learning need to pay attention to the process of metadata creation. That subject taxonomy was evaluated by asking real users of the repository to upload a bunch of objects, and classify them as part of the metadata creation process. The intention was to see if they felt the taxonomy reflected their own conceptualization of their subject area, but the exercise incidentally highlighted some troubling issues with untrained resource authors creating subject metadata. (Currier, 2001; Currier et al, 2004)
Another UK-based learning object repository project, HLSI, expected in the beginning (as was the norm in e-learning at the time) that resource contributors would upload their objects and add their own IMS metadata. At the IMS Meeting in Sheffield in September 2002, Ben Ryan, the project’s Software Development Manager, pulled me aside and said that they had a problem with their metadata. It appeared that, when searching the repository, no learning objects were coming up- had the metadata disappeared? When they went in and looked at it, the metadata was there. It was just, to paraphrase, rubbish. HLSI is an ongoing concern, and they have done some of the best and most extensive work around the metadata creation issue as a result of this early, understandable, gap in their remit. Early steps included providing more user support through education and documentation, and employing a team of information science professionals to improve the existing metadata (Ryan, 2003). By June 2003, 2,500 metadata records had been re-edited, taking about 550 hours and costing around £6500, or about £2.60 per record. Subsequent measures have been designed to prevent future expenditure!
The Collaborative modelAmong other improvements, the process of metadata collection has now been split into two stages (Currier et al, 2004):
- The educational practitioner is responsible for entering the basic metadata, including title, description, contribution and any technical information they may be aware of.
- The information scientist is responsible for reviewing the basic metadata and providing additional metadata for subject classification, educational attributes etc.
This collaborative model is something which has also been suggested in the area of metadata in support of the Semantic Web, in virtually the only place Jane Barton and I were able to find similar concerns emerging. Research by Jane Greenberg and W. Davenport Robertson in the US has suggested that"
… the integration of expert and author generated descriptive metadata can advance and improve the quality of metadata for web content, which in turn could provide useful data for intelligent web agents, ultimately supporting the development of the Semantic Web. […] If such partnerships are well planned and evaluated, they could make a significant contribution to achieving the Semantic Web." (Greenberg & Robertson, 2002)
There is plenty of potential for problems in such a collaborative model, as most people who have tried to work across the different tribes of techies, librarians, teachers and designers will know. Simon Pockley (2004) has just published a brilliant chapter on what he encountered in just such an attempt at collaborative metadata creation, on an Australian arts project. He focuses heavily on the different cultures involved in these different tribes, each with their own drivers and agendas, and he even names a new tribe, to which I suspect I belong: The Metaphiles. The other “key character species” he identifies within the information ecology are: The Creatives; The Educators; The Cataloguers; The Technologists; The Administrators; and The Hacktivists. He points to “the development of a poetic for the art of metadata”, and finishes with this beautiful thought:
"Just as the production of feature films has been characterized by the concept of assembly or montage, so we could consider metadata production to be the result of the combined efforts of quite separate skills. Perhaps it is time for the Metaphiles to talk more about the art of metadata, about how images and sounds can also be metadata and about the new literacy of this emerging form of expression.
In the two years or so since that evaluation of the SeSDL taxonomy, I for one have been trying to do that (well, except for the bit about images and sounds!), and have seen some real changes happening. So what’s new on the ground level of digital learning object repositories and their metadata? What has emerged since the publication of the first version of Jane Barton’s and my paper on quality assurance for metadata (Currier & Barton, 2003)? Well, the JORUM project, which is a major UK-wide funded project developing a cross-sectoral, multi-disciplinary repository for sharing learning objects, has very recently published a major report on the investigative stage of their work. This scoping document is an invaluable gem, and many of its sections will be of great help to subsequent projects. In Volume 5: Metadata, they survey current ideas around the metadata creation workflow and who should create metadata for their repository, which will be drawing objects from a very distributed range of sources. At this stage, they have recommended a collaborative model of metadata generation which will be explored further as part of the Transition to Service phase of the JORUM project.
Late last year, the ADL Academic Co-Lab made a report from their Global Learning Repositories Summit available, written by Colin Holden (2003). While I’m a little disappointed to see that this report does not reference the work we’ve been doing on the metadata quality issue here in the UK, it is heartening to see someone from one of the big interoperability bodies covering this issue- which he does in a section entitled ‘Quality and Consistency of Metadata’. It is also encouraging to note that very similar issues are emerging in the universe of SCORM. What is not so heartening is that this discussion is still framed within a 'resource authors create metadata' paradigm. There are two issues for discussion given in summary to the metadata section:
"How can creators of metadata be encouraged to create metadata that meet the needs of other users and administrators?"This is disappointing, because they don’t appear to have caught up with the debate as we are having it here in the UK- where we are finding suggestions that just encouragement of resource depositors may not be nearly enough. However, the second question they finish with is very pertinent, and echoes the final research question in our ALT-J paper being published this month (Currier et al, 2004).Colin Holden says:
"How important is it that users be capable of performing precise searching? Does its importance justify the institutional and practical work that would need to be done to produce a collection capable of responding to precision searching?"
"How will users search for materials within learning object repositories and networks? For example, how important is it to have authority control over the names of authors and contributing institutions? What educational attributes will users search for, and how? Answers to these questions will have a profound impact on decisions about metadata creation."
Finally, the fact that I am writing this as an ex-CETIS staff member rather than as CETIS EC-SIG Coordinator is a pointer to where things may be going. I am now Project Librarian on Stòr Cùram, a Scottish learning object repository project which heeded the problems that had been identified before, and budgeted for a full-time librarian for the length of the project, of equal status with the project learning technologist and two e-learning advisers. This isn’t to say that I will personally create all the metadata, but we will be paying attention to quality, workflow, training, staff development, and ultimately how our users will want to search the repository. Watch this space for further developments …
Barton, J., Currier, S. & Hey, J. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice, DC-2003 Proceedings of the International DCMI Metadata Conference and Workshop, September 28-October 2, 2003, Seattle, Washington USA, pp. 39-48. Available online: http://www.siderean.com/dc2003/201_paper60.pdf
Currier, S. (2001) SeSDL Taxonomy evaluation report (Glasgow, University of Strathclyde). Available online: http://www.sesdl.scotcit.ac.uk:8082/taxon_eval/SeSDLTaxFinRep.doc
Currier, S. & Barton, J. (2003) Quality Assurance for Digital Learning Object Repositories: How Should Metadata Be Created? Cook, J. and McConnell, D. (Eds.) (2003). Communities of Practice. ALT-C 2003 Research Proceedings. Held 8-10 September, 2003, University of Sheffield & Sheffield Hallam University.
Currier, S., Barton, J., O’Beirne, R., Ryan, B. (2004) Quality assurance for digital learning object repositories: issues for the metadata creation process, ALT-J, Research in Learning Technology Vol.12, No.1 (Mar. 2004), pp. 5-20. Contact: Sarah Currier, email@example.com
Downes, S. (2001) Learning objects: resources for distance education worldwide, International review of research in open and distance learning, July 2001. Available online: http://www.irrodl.org/content/v2.1/downes.html
Greenberg, J. & Robertson, W. (2002) Semantic Web construction: an inquiry of authors’ views on collaborative metadata generation, Proceedings of the International Conference on Dublin Core and Metadata for e-Communities 2002, pp. 45-52. Available online: http://www.bncf.net/dc2002/program/ft/paper5.pdf
Holden, C. (2003) From Local Challenges to a Global Community: Learning Repositories and the Global Learning Repositories Summit V1.0, ADL Academic Co-lab. Available online: http://www.academiccolab.org/resources/FinalSummitReport.pdf
JORUM+ Project Teams (2004) JORUM Scoping and Technical Apraisal Study. Volume 5: Metadata. Available online: http://www.jorum.ac.uk/vol5_fin.pdf
Pockley, S. (2004) Metadata and the arts: the art of metadata, Gorman, G. & Dorner, D. (2004) International yearbook of Library and information Management 2003/2004: Metadata Applications and Management, London, Facet, pp. 66-92.
Ryan, B. (2003) Creating, using and re-using learning objects. [PowerPoint presentation] (Huddersfield, HLSI Project) Available online:
Ryan, B. & Walmsley, S. (2003) Implementing metadata collection: a project’s problems and solutions. Learning technology, Vol. 5, no. 1, Jan. 2003. Available online: http://lttf.ieee.org/learn_tech/issues/january2003/index.html#3