Posted on April 02 2004 by Stephen Downes
in reponse to Metadata Quality in e-Learning: Garbage In - Garbage Out?
This is a useful article. However, there is a certain respect in which it is wrong, and this should be drawn out. There is also an important sense in which it is correct, but incomplete, and this should also be drawn out.
The sense in which it is wrong is the sense in which it argues that my original supposition - that "authors will simply access a form into which they enter the appropriate metadata" - is incorrect.
In fact, the mechanism described in my Learning Objects paper is pretty much exactly the way most metadata in the world is entered today, especially when one considers that I also noted, in the same essay, that "many editors will have metadata generators built in."
Consider the generation of RSS metadata, for example. This admittedly simply format is based almost exclusively on author input. Data for various field entries is collected from input provided by authors into forms, and specifically, the forms into which authors enter their posts in their blogging software.
That such metadata is in a certain sense sufficient is, in my mind, indisputable. RSS is a metadata success story, with people reading entries directly from metadata, with people conducting searches based on metadata contents, with metadata being used to standardize interoperability and reusability.
It is not, of course, sufficient, and this leads us to the point where the authors are correct but incomplete. It is clear, from both experiences in HTML META tags and RSS categorization fields, that author-generated classification and other specialized metadata is not reliable, not simply because authors will not enter the required data, but because the data entered is unreliable, somethimes through error or omission, and in other cases deliberately.
Thus the author's central point in the article, that the task of metadata creation ought to be separated into distinct tasks, with distinct authors, is correct. Classifications of learning resources ought to be cast by experts in classification, and in particular, experts in the particular classification system being used.
Now the author suggests that this task ought to be carried out by librarians (though I may be reading a bit between the lines here). Whether this is the case is perhaps open to question. My own Edu_RSS system performs a classification exercise by analyzing RSS files, but this is accomplished automatically, with no human intervention at all. Though my own system is a bit of a hack, and hence not perfect, it nonetheless demonstrated that raw categorization may be accomplished by machine.
In any case, the author is, as I suggested, correct in the assertion that there ought to be a division of powers in metadata creation. Quite so, but why not push this further? In my paper, Resource Profiles [ http://www.downes.ca/files/resource_profiles.htm ], I discuss this extension in detail (what some have called exhausting detail).
Instead of a 'collaborative model' of metadata creation, I argue for what may be called a 'distributed model'. Why the difference? The former suggests that all metadata authors are engaged in the same task, that they are aligned in some way. But if we extend the range of possible metadata to include more than just a single classification scheme, to include more than just classification but also evaluation and other assessments of suitability, it seems clear that metadata authors may be working at cross purposes.
Another major difference is that the picture of metadata authoring described by the author seems to imply that metadata needs to be created before the object is distributed and used. Call this 'a priori metadata'. And while it seems clear that a minimal amount of metadata is needed to allow an object to be located at all, it seems that metadata authored *after* use might be equally, if not more, useful. Call this 'a posteriori metadata.
When we begin to admit a posteriori metadata into our system, when we begin to allow actual user ratings, actual contexts of use, and actual user demographics, into our description of an object, the crucil role today played today by clasification metadata is significantly reduced. Not to say that it is useless - but if you could feasibly put reviews as well as classifications into card catalogues, who wouldn't jump at the chance.
The use of a posteriori metadata also greatly increases the pool of potential metadata authors. A librarian may be the authority on the classification of an object, but the only authority on its eventual use is the person who actually uses it. And while a priori peer reviewers may be able to comment authoritatively on the value of an object, a much wider and potentially more reliable evaluation may be generated by actual users.
Accordingly, I would suggest that the picture of metadata outlined in my Learning Objects paper is fundamentally correct. For an author, adding metadata will be a minor matter. And while not discounting the potential value of human-based classification, I submit that data collected automatically over time will become equally, if not more, important.
Replies to this post: