Community:CITI/OAIHarvestIngest/HarvestingOtherFormats

From NSDLWiki

Jump to: navigation, search

back to OAIHArvestIngest

Notes on harvesting metadata formats other than oai_dc or nsdl_dc

  • Assumptions:
    • It is desireable to ingest, via OAI feeds, metadata formats other than nsdl_dc or oai_dc into the NDR.
    • Management of new or multiple alternate metadata feeds within the NCS and through the Harvest Manager is do-able.
    • The internal storage of alternate metadata items can use the current NDR notions of a collection within the NDR. For example: metadata providers and aggregations (if necessary).
  • Questions:
    • Will this metadata be presented in the library's search & browse interface?
    • Will resources represented within the metadata (i.e. dc:identifiers) need to have Resource objects created or maintained for them in the NDR?
    • In the case where providers are already contributing nsdl_dc or oai_dc: is there a requirement for any association between the new format of metadata and the items already provided?
    • Will there be cases where metadata providers give us more than one metadata format?
    • Are there concrete use cases we can use to do further analysis?
  • Currently
    • We require either oai_dc or nsdl_dc to be harvested, and then transformed into nsdl_dc for presentation in the library. The two metadata streams (native and nsdl_dc) are stored in a single Metadata object within the NDR. This follows the current NDR Metadata object model - where an item contains one data stream of the native (now only nsdl_dc or oai_dc), and a transformed version of the native stream for use in the library - as nsdl_dc.
    • We maintain two transforms (in general), one for inbound nsdl_dc and one for inbound oai_dc.
    • The Metadata Object model represents the resources within the metadata through relationships expressed in the RELS-EXT steam of the fedora object.
    • The ingest process assumes there is only one native source for metadata within a given Metadata object.
  • Thoughts on additional formats:
    • The current Metadata Object model represents the resources the metadata is about through relationships expressed in the RELS-EXT of the fedora object. If more than one native metadata stream is represented in a single metadata object, some new mechanism will have to distinguish these relationships within the RELS_EXT in the fedora object.
    • If metadata of a different format must be associated with other library metadata within the same metadata object, then more significant modifications to the current ingest process would be required. The current process assumes that the inbound source metadata is the only thing contained within a Metadata object.
    • If additional formats do NOT (yet) require presentation in the library, then a process that creates or updates new Metadata objects for these items may be desireable.
    • If a different format can be represented as a separate metadata object, then the process becomes somewhat simpler. Additional formats could easily be represented using the same general Metadata model we now have. The presence of an nsdl_dc stream could potentially be made optional.
    • The OAI service (proai) is driven by updates to a Metadata object at the object level, not the datastream level. Changes to any Metadata object will trigger proai updates.
    • Control of metadata visibility in the OAI service is currently controlled at the metadata provider level for all metadata records within the individual collection.
    • A different means to index our metadata (other than via our own OAI source) may lessen the requirement for all public, library metadata to be represented by nsdl_dc. If additional formats require presentation in the library, other means to retrieve metadata from the NDR for presentation might remove this requirement.


NOTE:
1. The oai_dc metadata format is a requirement of the OAI-PMH protocol so, theoretically, there are no OAI-PMH providers that do not provide oai_dc.

2. There is no guarantee in the OAI-PMH spec. that a record that is available from a provider in one format is also available in a different format - even if both formats are provided for some records.

Personal tools