Community:Collections and Metadata/XpandMR

From NSDLWiki

Jump to: navigation, search

Expanding the NSDL Metadata Repository

What's in this Space

Related pages:

Recent discussions on the NSDL Metadata Repository have centered on the expansion of the basic model to handle other metadata critical to growth of the NSDL. These expansions fall into three basic categories:

  • Metadata in formats other than Dublin Core. This need includes accommodating "native" metadata formats for items harvestable now, as well as the ability to indicate equivalence between items across collections (and when the same resources appear in the MR as both collections AND items in another collection).
      • [Dean's Comment: There seem to me to be two very separate issues here. We've been planning on harvesting native metadata formats (indeed, records in any format that meets the OAI requirements) for a long time. However, the ability to indicate equivalence is a separate and much more interesting challenge that doesn't seem to fall under the "metadata in other formats" category. It really falls into the "how to express relationships as metadata records", when it comes to ingesting information about equivalence, and on the discovery side includes how to do a transitive closure on equivalence relationships to be able to display the set of equivalent resources]
  • Metadata about content other than collections and items, but related in some way to them. Annotations are at the top of this list, but certainly there are and will be other kinds of resources we will want to accommodate.
      • [Dean's Comment: It is not clear to me that we can simply separate out this category from the one below. It is easy to imagine someone creating a resource review annotation record that includes both a ranking and audience level for the item reviewed, in addition to a written critique (content). If we view annotations as things that can be viewed as either augmenting the metadata of items they are related to or as content and records in their own right, I think that we're much closer to where we should be. The "type" of the relationship would then govern how the content and metadata of the annotation would be discovered and displayed in the context of the original record.]
  • Augmentation of metadata already residing in the MR but not adequately supporting needed functionality. In this category is particularly critical for support of educational functions, since information about audience and relationships to educational standards is not often supported by harvested data. We envision future augmentation services and strategies which would allow both harvesting, updating of harvested records, and routine augmentation (and refreshment of augmentation) operating on a similarly automated basis. See the MetaAug wiki for more information on this subject.
      • [Dean's Comment: See above for why I believe this is not a separate category from annotations. If these are treated as separate records of metadata (and potentially content) that come with both a relationship and an identification of the base metadata record that they annotate, then we can handle them just the way we do all other metadata records in the MR - at least as far as ingest goes.]

In order to move beyond the simple aggregation model developed in the first years of NSDL, and to take advantage of the flexible repository already developed to support this model, we need to plan for the ingest and management of different and richer metadata formats. To make this useful to other services, portals and partners, we will also need to be able to output this data into any format we choose, whether native, combined, normalized, or value-added, for functions we cannot yet fully visualize.

      • [Dean's Comment: I agree, that these are things we need to do. I'm not at all certain that the "flexible repository already developed" can fully support the needed model. It can store the metadata subset of annotation and relationship records, but it currently offers no support for delivery of combined records or discovery of relationships.]

First steps (in approximate order):

  • 1. Develop a "metadata augmentation" methodology, which would build on the notion of established semantic relationships and multiple metadata schemas. As an example of augmentation, SDSC might supply information on resource formats to a subset of NSDL resources, as part of its archiving service. The specification for this data might require that it be delivered in an OAI record containing at least one DC:format element with IMT values, as well as a DC:Identifer holding the URL. This would be stored in the MR and be available for use by the user interface or any other service. Conceptually, augmented metadata become component parts of existing metadata records but are managed as entities independent of the existing record for the purpose of updating and reharvesting.
    • See the [MetaAug Metadata Augmentation wiki] for additional information on progress in this area.
      • [Dean's Comment: This seems to propose using a DC:Identifier URL as the unique identification of a science resource in the MR. I believe that this is wrong. I also believe that the approach of viewing these as harvested independent elements is wrong. These should be treated as full records in their own right. They have a collection (source), their own OAI identifiers, information about the relationship type, and the NSDL ID (OAI ID) of the science resource that they "annotate". From there, the "yet to be determined" relationship architecture and query engine will allow these to be exposed as either sourced annotations, as integral parts of the metadata for the original resource, or however else the user (or service) wants to see them expressed.]
    • Tasks:
      • Develop requirements for managing incoming elements within MR at the element level (Tim and Jon already have a good start at this)
          • [Dean's Comment: I don't believe that this is the correct approach (see above)]
        • Should include possibility of schema that exposes source of each element to managers and interested external parties
          • [Dean's Comment: See above for an alternate view of how to achieve this]
        • Should provide for implementation of a collection-specific data normalization and enhancement service providing augmented metadata
          • [Dean's Comment: What does this mean? Why is a collection-specific, rather than metadata-type specific, capability needed?]
      • Develop use cases for augmentation situations (use iVia and SDSC planning as the major thrust) (Diane)
      • Determine scope of schema issues required to ingest and output additional types of services
      • Develop test plan and initial steps with partners
    • Questions:
      • How would harvest of augmented data be managed within the PRS environment?
  • 2. Expand and enhance the current notion of relationships within the MR. The deliberately limited set of inter-resource and inter-record relationships currently supported could easily be expanded and enhanced to make the MR much more RDF-like -- such relationships are already expressed as triples for instance. The bulk of the work will be related to defining the methods and metadata with which external systems and services (such as annotation and equivalence services) will define those relationships and provide them to the MR, and helping user services to develop useful interfaces.
        • [Dean: The current expression of relationships in the MR is extremely rudimentary (as I understand it - one three column relational table). Rather than trying to reimplement RDF within the current MR, it would seem to make much more sense to build on the RDF work already done (see There is critical work that needs to be done in defining a hierarchy of relationships and exploring the ways that these should be usable in discovery and dissemination - whatever mechanism we use to represent them.]
    • Tasks:
      • Identify existing services (such as annotation services) and work with them to develop methods to get their metadata into the MR
      • Create a small-scale resource rating service that we can use as a test-bed
      • Investigate and create alternative interfaces to data retrieval from the MR and SDS
      • Determine whether or how this work will be coordinated with the development effort on RDF
      • Feed requirements to Portals group for development of interface that can present equivalence and annotation to users
      • Review Annotation Workspace documents and update when necessary
          • [Dean's Comment: It would be very helpful to resort these tasks to separate out those that rely on a specific implementation at the MR level (e.g. the first task below), from those that are independent of the specific MR implementation (e.g. the use cases for augmentation)]
    • Questions:
      • Should we begin by exploring whether we need a short term and long term plan for this?
  • 3. Develop an internal "metadata registry" which in the short term will manage semantic relationships between data elements in a number of metadata formats or schemas. This could be thought of as a local instance of a larger, more distributed registry system now under development. (see DCMI Roadmap for development of Vocabulary Management and Schema Registry Systems for a discussion about how this might work). This development should also support the process of crosswalking data as well as assist in managing change in metadata schemas.
    • Tasks:
      • Determine what format(s) to start with, and where reliable crosswalks exist
      • Explore possible strategies for expanding the current internal registry function
    • Questions:
      • Where will we need schema changes to manage this? How will we manage schema needs in future?
          • [Dean: This seems very valuable in a number of dimensions. For example, it would make it potentially more meaningful to create an augmented metadata record that combines information from several different metadata descriptions of the same resource.]
Personal tools