Author's response to Stephen Downes' comments
Posted on April 02 2004 by Sarah Currier
in reponse to http://www.downes.ca
Hi Stephen (et al),
Wow, fast response! A tribute to the wonders of RSS I guess. A few responses below (although I think we are more in accordance with each other than you may have thought).
One thing I wanted to make clear is that this feature was intended as a summary/update on three published research papers I co-authored over the last year. As such, this topic has become something of a rote speech for me, and I think I dashed it off, kind of glossing over things that I take for granted (well, I didn’t have the 3000 words a research paper allows you either). So, I think there is one main area where I do disagree with you (classification by human vs. machine)- the rest is simply where I haven’t been clear enough about where I’m coming from.
> The sense in which it is wrong is the sense in which it argues that my original supposition - that "authors will simply access a form into which they enter the appropriate metadata" - is incorrect.
Actually, we never intended to say that this statement of yours was incorrect, and I’m sorry if it reads that way in the feature- I hope it is clearer in the papers. The purpose of quoting you was to illustrate the way in which metadata creation has been and is still largely thought about, talked about, and practiced in the LO repository realm. You were the only person we could find who said it explicitly, such was it assumed that authors would be the creators of metadata for LOs (and we needed a referenced quote, as you do!). This was merely our way of saying: ‘This is how it has been, and now we will talk about why we think this is not sufficient and what can be done about it’.
> Consider the generation of RSS metadata, for example.
> That such metadata is in a certain sense sufficient is, in my mind, indisputable. RSS is a metadata success story, with people reading entries directly from metadata, with people conducting searches based on metadata contents, with metadata being used to standardize interoperability and reusability.
I’m not clear by what criteria you regard it a “success story”; in terms of the kind of community of VERY non-technical academics my project is dealing with, you might as well be speaking Greek if you talk about RSS and blogging. I’m interested in actual evidence that the people using LO repositories can find (all of) the materials they are looking for. And by that I don’t mean the current users who will be the people with a strong interest in using technology in their teaching, but the vast majority of academics who aren’t that interested now, but whom we aim to engage with. I’m not sure that RSS/blogging is either a good analogy or a mature enough phenomenon to be used as an analogy.
> Now the author suggests that this task ought to be carried out by librarians (though I may be reading a bit between the lines here).
I didn’t actually mean to narrow it down to librarians- I think the term we used most in the papers was “information specialists” but could just as easily be information managers, metaphiles or metadata experts – the point is that there are a number of metadata creation tasks that require a degree of skill and training, and also that require someone with an overview of the repository and the wider environment the repository might interoperate with (including people in communities). This isn’t just confined to classification, and it has as much to do with the workflow, tools, training, and so forth as with the individual metadata creator.
> Whether this is the case is perhaps open to question. My own Edu_RSS system performs a classification exercise by analyzing RSS files, but this is accomplished automatically, with no human intervention at all. Though my own system is a bit of a hack, and hence not perfect, it nonetheless demonstrated that raw categorization may be accomplished by machine.
What kind of evidence do you have that it is sufficient? How have you “demonstrated” this with regard to the needs of searchers? This is a debate that has long been argued over in information science, computer science and related fields- is human classification better, or machine classification? And many studies have been done. My own opinion is that, while there may be some areas where machine classification is OK, they would likely be areas where the materials are textual and subject to fairly rigorous structural constraints (such as research papers), and the terminology, semantics, conceptual structure etc. are fairly well agreed and stable, such as the hard sciences, where a term used in one paper can pretty much be guaranteed to mean the same thing in another paper. I’m working in the area of social work education, where the field is ‘soft’- terms are used in varied ways, the terminology and conceptual structure evolves rapidly according to a number of factors such as geography, social and political climate and fashions, as well as in response to research. And many of our objects will be videos, perhaps audio files, images, etc.
> Instead of a 'collaborative model' of metadata creation, I argue for what may be called a 'distributed model'. Why the difference? The former suggests that all metadata authors are engaged in the same task, that they are aligned in some way.
Didn’t mean to suggest that- and no problem with the suggestion of “distributed model”- which I think doesn’t really contradict what we were suggesting. There are a number of possible ways in which more than one person’s expertise can be used- I think we used “collaborative” meaning to encompass all these potential models, a poor choice of word perhaps? Of course we define it a lot more in the paper.
> But if we extend the range of possible metadata to include more than just a single classification scheme, to include more than just classification
I never narrowed it down to that in the first place …
> but also evaluation and other assessments of suitability, it seems clear that metadata authors may be working at cross purposes.
Well, Simon Pockley, who I quote in the feature, would agree with you there, but I’m an optimist, and think, with proper management, it ain’t necessarily so.
> Another major difference is that the picture of metadata authoring described by the author seems to imply that metadata needs to be created before the object is distributed and used. Call this 'a priori metadata'.
Actually, I think this is just another case of me not being as clear as I was in the papers. I was actually limiting myself to the generation, by humans, of metadata for resource discovery, explicitly excluding machine-generated metadata, and the kind of metadata that some call “secondary metadata” that you go on to discuss. The reason for excluding the latter was that, because it is a fairly new concept in LO repositories, we had no evidence either way that the issues we were discussing would have impact on it. Overall we were involved in a consciousness-raising exercise for e-learning; saying here are some questions, problems and possible solutions- if we can get people at least looking at this problem space then future development such as the other kinds of metadata will only benefit.
> When we begin to admit a posteriori metadata into our system, when we begin to allow actual user ratings, actual contexts of use, and actual user demographics, into our description of an object, the crucil role today played today by clasification metadata is significantly reduced.
You are making a statement about something that we can’t possibly know. I know that last week I would have given my eye teeth for a LO repository that I could search by subject for a lecture I was giving on managing digital learning resources. I can’t see it ever being something people won’t need. And again, I was NOT limiting my argument to subject classification. I really feel it is a red herring when people try to divert this into a discussion about ‘what about secondary/usage metadata / review systems’ etc.- talking about quality of descriptive metadata does not in any way detract from the importance of those discussions! It’s not an either/or thing, but there are management issues involved in both.
> Accordingly, I would suggest that the picture of metadata outlined in my Learning Objects paper is fundamentally correct. For an author, adding metadata will be a minor matter.
I certainly hope so! It has sometimes seemed to me in moments of exasperation that e-learning folk both hate the idea of teachers having to create metadata, seeing it as a boring, mind-numbing, administrative task, but that they don’t want to devolve it to those of us who love it either! It seems self-evident to me that we should gather from the author the fruits of their own expertise, but that someone else should be responsible for making sure it becomes conformant metadata to ease resource discovery and interoperability.
> And while not discounting the potential value of human-based classification, I submit that data collected automatically over time will become equally, if not more, important.
I’m certainly for developing machine-generation of metadata as far as it can possibly go. But I would really need to be convinced that a particular instance of this was really working for the end user. As someone who spent a good 7 years of my life creating metadata I can affirm that it is an art, not a science, and that the day when machines can do it all will be the same day that I trust a translation program to translate my novel into Greek without human intervention. I think that day is a long way off, if it ever comes at all. Human thought and creativity are just too complex.
Best wishes, and thanks for the chance to rant even more about my favorite topic ;-)
Replies to this post: