Repositories open up to web crawlers
Scott Wilson, CETIS staff
November 28, 2001

A new gateway service allows web crawlers the programs that index web pages for search engine sites like Google to access metadata in repositories that implement the Open Archives Initiative (OAI) protocol.

Many libraries and databases are still a closed book as far as the web is concerned, with search engines unable to peer inside and index their contents. Using the new DP9 gateway, a repository that uses the OAI protocol can be indexed by web crawling software, enabling material held in the repository to show up in web searches.

DP9 does this by providing a persistent URL for repository records, and converting this to an OAI query against the appropriate repository when the URL is requested. For example:

http://arc.cs.odu.edu:8080/dp9/getrecord/oai_dc/oai:NACA:1917:naca-report-10

This URL is converted by the DP9 gateway into an OAI query to retrieve the Dublin Core metadata for this record, which is then presented to the user or web crawler as an HTML page.

Gateway services like DP9 provide one way in which existing repositories can become interoperable with the web; similar gateway services are being proposed within the IMS Global Learning Consortium's Digital Repositories group to allow applications to search and retrieve resources from both Learning Object repositories and existing library systems.

This presents a technical challenge as many libraries use the z39.50 standard for searching, whereas other repositories may use SQL, XmlQuery or other languages.

Gateway services provide a method of allowing applications to access a wide range of resources in different repositories without their developers or their users needing to understand multiple query languages and communication protocols.

The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. The organisation also develops and promotes interoperability specifications, such as the Metadata Harvesting Protocol.

For more information on the DP9 gateway, visit the DP9 website.