Community:Collections and Metadata/OAIResourceIdentifiers
From NSDLWiki
Proposed OAI Identifiers for aggregated resource-oriented metadata
One of the challenges to redistributing aggregated, metadata records about resources is that the metadata is being harvested from multiple sources and the aggregated metadata record is essentially an abstraction. It therefore doesn't conform to our notion of derived oai identifiers in which the identifier that we serve is a composite of our identifier for the metadata provider (their Naming Authority) and the id that they assigned to that record.
Another challenge is that when retrieving aggregated metadata about a resource from multiple metadata providers, there may be many minor variations of the resource URL that we use to identify metadata records related to the same resource. While acknowledging that the 'equivalence problem' related to identifying the same resource served in multiple formats, from mirrored servers, and from different archives is indeed a much larger problem, we think it's still important to make some attempt to normalize minor URL variants that would otherwise balkanize the metadata related to a single URL.
Tim and I spent some time discussing these problems and came up with a proposal which is essentially this:
- Create two oai identifiers for each URL for which we have metadata. For the URL http://www.exploratorium.edu/ this would consist of the usual nsdl oai identifier construct:
- protocol: 'oai'
- domain: 'nsdl.org'
- the namespace of the process that will return metadata either: 'nsdl.uri' or 'nsdl.likeuri'
- the url-encoded URI: http%3A%2F%2Fwww.exploratorium.edu
- oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
returns metadata related to the specific URI provided. - oai:nsdl.org:nsdl.likeuri:http%3A%2F%2Fwww.exploratorium.edu
returns metadata related to a normalized version of the URI provided.
- oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu
- Create a record in the MR for each of the resource identifiers that would be served
Each record would have an nsdl unique id == to the oai id of the record being served
Requesting resource-related metadata from the MR would become a simple matter of specifying an exact or a 'like' match and supplying a URI.
dc:identifier of Resource supplied by metadata provider |
mrec_id of original source record |
mrec_id of nsdl.uri |
nsdl_unique_id of nsdl.uri |
dc:identifier 'Normalized' |
mrec_id of nsdl.likeuri |
nsdl_unique_id of nsdl.likeuri |
http://www.exploratorium.edu | 1001 | 2001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu | http://www.exploratorium.edu | 3001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu |
http://www.exploratorium.edu/index.html | 1002 | 2002 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2Findex.html | http://www.exploratorium.edu | 3001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu |
http://www.exploratorium.edu/ | 1003 | 2003 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2F | http://www.exploratorium.edu | 3001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu |
http://www.exploratorium.edu | 1004 | 2001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu | http://www.exploratorium.edu | 3001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu |
http://www.exploratorium.edu/ | 1005 | 2003 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2F | http://www.exploratorium.edu | 3001 | nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu |
Base on the above set of 5 metadata records from 5 collections, the mrec_id served and the original metadata records aggregated and returned for each request would be:
- oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu, mrec_id = 2001, records served = 1001, 1004
- oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2F, mrec_id = 2003, records served = 1003, 1005
- oai:nsdl.org:nsdl.uri:http%3A%2F%2Fwww.exploratorium.edu%2Findex.html, mrec_id = 2002, records served = 1002
- oai:nsdl.org:nsdl.likeuri:http%3A%2F%2Fwww.exploratorium.edu, mrec_id = 2002, records served = 1001, 1002, 1003, 1004, 1005
We think that we only have to add 2 fields to the existing uri_index table to hold the mrec_id for the actual_uri and like_uri aggregation records in order to create the necessary relationships
Serving nsdl_augmented metadata using the nsdl_aug metadata prefix
Making a request for a standard metadata record in one of the nsdl_augmented metadata formats (which are by design resource-oriented) would return an augmented record based on the nsdl.likeurl relationships.
- http://services.nsdl.org:8080/nsdloai/OAI?verb=GetRecord&identifier=oai:nsdl.org:367822:oai:infomine.ucr.edu:3593&metadataPrefix=nsdl_aug_gold
- http://services.nsdl.org:8080/nsdloai/OAI?verb=GetRecord&identifier=oai:nsdl.org:380605:oai:dlese.org:NASA-Edmall-634&metadataPrefix=nsdl_aug_gold
Both of the following records would return the same metadata aggregated from the original records == 1001, 1002, 1003, 1004, 1005