Community:Collections and Metadata/OAIResourceIdentifiers

From NSDLWiki

Jump to: navigation, search

Proposed OAI Identifiers for aggregated resource-oriented metadata

One of the challenges to redistributing aggregated, metadata records about resources is that the metadata is being harvested from multiple sources and the aggregated metadata record is essentially an abstraction. It therefore doesn't conform to our notion of derived oai identifiers in which the identifier that we serve is a composite of our identifier for the metadata provider (their Naming Authority) and the id that they assigned to that record.

Another challenge is that when retrieving aggregated metadata about a resource from multiple metadata providers, there may be many minor variations of the resource URL that we use to identify metadata records related to the same resource. While acknowledging that the 'equivalence problem' related to identifying the same resource served in multiple formats, from mirrored servers, and from different archives is indeed a much larger problem, we think it's still important to make some attempt to normalize minor URL variants that would otherwise balkanize the metadata related to a single URL.

Tim and I spent some time discussing these problems and came up with a proposal which is essentially this:

  1. Create two oai identifiers for each URL for which we have metadata. For the URL http://www.exploratorium.edu/ this would consist of the usual nsdl oai identifier construct:
  • protocol: 'oai'
  • domain: 'nsdl.org'
  • the namespace of the process that will return metadata either: 'nsdl.uri' or 'nsdl.likeuri'
  • the url-encoded URI: http%3A%2F%2Fwww.exploratorium.edu
    • oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
      returns metadata related to the specific URI provided.
    • oai:nsdl.org:nsdl.likeuri:http%3A%2F%2Fwww.exploratorium.edu
      returns metadata related to a normalized version of the URI provided.
  • Create a record in the MR for each of the resource identifiers that would be served
    Each record would have an nsdl unique id == to the oai id of the record being served

Requesting resource-related metadata from the MR would become a simple matter of specifying an exact or a 'like' match and supplying a URI.

dc:identifier Table
dc:identifier of Resource
supplied by metadata provider
mrec_id of
original source record
mrec_id of
nsdl.uri
nsdl_unique_id
of nsdl.uri
dc:identifier
'Normalized'
mrec_id of
nsdl.likeuri
nsdl_unique_id
of nsdl.likeuri
http://www.exploratorium.edu 1001 2001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu http://www.exploratorium.edu 3001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
http://www.exploratorium.edu/index.html 1002 2002 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2Findex.html http://www.exploratorium.edu 3001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
http://www.exploratorium.edu/ 1003 2003 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2F http://www.exploratorium.edu 3001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
http://www.exploratorium.edu 1004 2001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu http://www.exploratorium.edu 3001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
http://www.exploratorium.edu/ 1005 2003 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2F http://www.exploratorium.edu 3001 nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu

Base on the above set of 5 metadata records from 5 collections, the mrec_id served and the original metadata records aggregated and returned for each request would be:

  • oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu, mrec_id = 2001, records served = 1001, 1004
  • oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2F, mrec_id = 2003, records served = 1003, 1005
  • oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2Findex.html, mrec_id = 2002, records served = 1002
  • oai:nsdl.org:nsdl.likeuri:http%3A%2F%2Fwww.exploratorium.edu, mrec_id = 2002, records served = 1001, 1002, 1003, 1004, 1005

We think that we only have to add 2 fields to the existing uri_index table to hold the mrec_id for the actual_uri and like_uri aggregation records in order to create the necessary relationships

Serving nsdl_augmented metadata using the nsdl_aug metadata prefix

Making a request for a standard metadata record in one of the nsdl_augmented metadata formats (which are by design resource-oriented) would return an augmented record based on the nsdl.likeurl relationships.

Both of the following records would return the same metadata aggregated from the original records == 1001, 1002, 1003, 1004, 1005

Personal tools