Community:Collections and Metadata/DocumentingProviderPractices

From NSDLWiki

Jump to: navigation, search



Documenting Provider Practices

Why documentation?

In all cases, there are two parties to an OAI-mediated transaction--the Data Provider and the Service Provider. They exchange metadata in the context of a protocol that allows a good bit of information about the data to be exchanged. However, the OAI world is becoming increasingly diverse, and in order to make best use of harvested metadata, the Service Provider may need to know more than what the protocol requires or encourages the Data Provider to expose. Given this scenario, the Data Provider should:

  • make careful use of all OAI sanctioned opportunities for documenting practices [when would it not be sanctioned?]
  • make additional information available for any Service Providers who wish to know more about the data and its origins, changes, and capabilities

In their chapter on The Continuum of Metadata Quality: Defining, Expressing, Exploiting published in "Metadata in Practice," [1] Thomas Bruce and Diane Hillmann assert that optimal metadata provision should include:

  • An expression of metadata intentions based on an explicit, documented application profile, endorsed by a specialized community, and registered in conformance to a general metadata standard (they assert that an XML schema does not express intention).
  • A source of trusted data with a known history of regularly updating metadata, including controlled vocabularies. This includes explicit conformance with current standards and schemas.
  • Full provenance information, including nested information, as original metadata is harvested, augmented, and re-exposed. This may not record changes at the element level, but should reference practice documentation that describes augmentation and upgrade routines of particular aggregators. [if no provenance, i.e., original metadata, this is moot, right? should we say so?]

What to document?

  • Data source and creation decisions and history
    • Is the data crosswalked from another data source? Is that data source available in its native form? If so, where?
    • Is the data created by humans or machines? Describe the creation methodology or refer to a description.
  • Use of controlled vocabularies:
    • What vocabularies are used, for what element, and under what circumstances (especially important if only oai_dc is used)?
    • Is the whole vocabulary available to be used with this data or is only a subset approved for use?
    • Are some terms used from local lists and not specified in the namespaces (possibly not formally documented?) Is any documentation available on local vocabularies, and if so, where?
  • Names practice:
    • Order of names (direct order or surname, forename)
    • Fullness of names (are initials used routinely instead of full names?)
    • Any additions to names (courtesy or academic titles, affiliations)
    • Authority source for names? (are name variants available?)
  • Dates
    • Date practices (again, particularly important for oai_dc)
    • Parsing rules if not encoded?
  • Identifiers
    • For unencoded or local identifiers, describe the source of the identifier and any parsing or validation rules
  • Quality control measures
    • Are controlled vocabulary values updated when the source vocabulary changes?
    • Are validiation routines run regularly on encoded data?
  • Updating practices and schedules
    • How often are new or changed records added to the database?
    • How many new or updated records are added per week or month?
    • [Don't know if these belong here, but shouldn't they be documented too?]
    • Are persistent deleted records enabled? How often are records deleted? Major purges?
    • Are sets added/removed on a regular basis? Are all records included in sets? Is there overlap?
  • Set decisions and specifications (see [SetPractices Sets] documentation)

How and where to document

  • In OAI responses
    • Repository description contained in the Identify response, including [BrandingSection Branding], [OAIidentifiersAndRegistrationSection OAI Identifier] and eprints descriptions
    • Set specifications documented in [SetPractices Sets]
    • Individual record descriptions in [AboutContainers About Containers], including [ProvenanceSection Provenance] and [RightsContainer Rights] information
  • External documentation


[1] The Continuum of Metadata Quality: Defining, Expressing, Exploiting / Thomas R. Bruce, Legal Information Institute, Cornell Law School; Diane I. Hillmann, National Science Digital Library. Published in “Metadata in Practice,” ALA Editions, 2004.

Personal tools