TNS Internal:NDR/API/2.0/implementationDetails/addCollection

From NSDLWiki

Jump to: navigation, search

Return to NDR/API 2.0 Requests By Object

Contents

[hide]


Under Construction
Proposed method for inclusion in NDR/API 2.0.

NDR API Implementation Details - addCollection

Create a collection for aggregating objects providing provenience that identifies the provider(s) of metadata and information stored as a part of this collection.

Discussions

Input Discussions

  • Schema for NDR Collection

Here is Katy's proposed definition NDR_Collection...

<?xml version="1.0" encoding="UTF-8"?>
<record 
    xmlns="http://ns.nsdl.org/ncs/ndr_collect" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://ns.nsdl.org/ncs/ndr_collect http://ns.nsdl.org/ncs/ndr_collect/1.00/XXXX.xsd">
  <recordID>NDR-000-000-000-0001</recordID> 
  <collectionID>xxxx</collectionID>
  <status>Active or Inactive</status>
  <date created="2004-11-24T13:45:27Z" modified="2004-11-24T13:45:27Z"/>
  <url>http://msteacher2.org</url>
  <title>Middle School Portal (MSP2)</title> 
  <description>The MSP2 collection blah blah</description>
  <contact email="ginger@ucar.edu">Katy Ginger</contact>
</record>

Thoughts and Questions:

At first, I thought this was what DDS wanted as input for putCollection, but in the documentation for putCollection, there doesn't appear to be any xml passed in as a parameter. If it turns out that this would be a parameter to PutCollection or some other DDS call, then it will be constructed from InputXML and NDR objects in order to be able to make the appropriate call(s) for DDS. See Indexing in DDS for more details. jw - Correct, this is not the same as what DDS needs. The basic DDS PutCollection call takes just title, description, xmlFormat, collectionKey.

jw - The intention for NDR_Collect is that it supplies the required input for the NDR to establish a collection. It is not intended to hold metadata associated with an external application or context - that would go into the app_data, NSDL_Collection (or other) annotation on the collection.

The following are proposed changes for input to the addCollection API request to maintain consistency in the specifications of NDR-API calls.

<recordID> is the same thing as <externalIdentifier> in addMetadataRecord and should be handled in the same way. That is, remove it from NDR_Collection and have it as a parameter that gets passed in. jw - OK, yes, this was intended to be externalIdentifier.
<collectionID> - What is this? What is its purpose? How is this different from <recordID>? jw - Not sure if this is needed. DDS/NCS has a notion of a CollectionKey - a short-hand identifier for a collection that is separate from the collection recordID. That might be what Katy had in mind? It is not necessary for the NDR, however, and we could just use the handle or recordId for collectionKey when we do a PutCollection into DDS. If it is need, it could be stored in the app_data record.
<status> - What is the use case for this? Fedora supports the following states on an object: Active | Inactive | Deleted. I'm wondering if this shouldn't be implemented using the Fedora object states. We already specify setting of Fedora state=deleted when deleting objects. See Implementation Details and Specification of deleteCollection for more information on the delete process and access to deleted collections. Access to Inactive objects would be similar.
<date create & modified> - If this is coming from the application, then these represent application data annotation. The Fedora object keeps track of this for the NDR.
<url> - This is a resource for the collection. It should be passed in as <resourceURL> directly on the InputXML to be consistent with addMetadataRecord.
<title>, <description> - metadata about the collection
<contact> - can this information be gleaned from the agent that is responsible for this collection? jw - Yes, possibly.


This would make the InputXML look like...

<?xml version="1.0" encoding="UTF-8"?>
<inputXML 
    xmlns="http://ns.nsdlib.org/ndr/request_v2.00/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ns.nsdlib.org/ndr/request_v2.00/ http://ns.nsdlib.org/schemas/ndr/request_v2.00.xsd" 
    schemaVersion="1.00.000">
  <metadata>                                        Is metadata the element name we want to use?
    <status>Active or Inactive</status>             Need use case.
    <title>Middle School Portal (MSP2)</title> 
    <description>The MSP2 collection blah blah</description>
    <contact email="ginger@ucar.edu">Katy Ginger</contact>  Does this come from agent?  If that's the case, it would not be here.
  </metadata>
  <resourceURL>http://msteacher2.org</resourceURL>
  <externalIdentifier source="NCS">NDR-000-000-000-0001</externalIdentifier>
</inputXML>

jw - Now that we have clarified what we need as input to addCollection, it's a small number of fixed things. Would it make sense to just have these be separate parameters to the API call rather than input an XML blob? E.g. arguments: title, description, status, contactEmail (optional), resourceUrl (optional), externalIdentifier (optional)?


NOTE: metadataRecordFormat is not passed in as a parameter, but is needed for the putCollection call's xmlFormat parameter. This is the format of all metadata that will be used when calling putRecord.


With John's proposed changes, InputXML would look like...

<?xml version="1.0" encoding="UTF-8"?>
<inputXML 
    xmlns="http://ns.nsdlib.org/ndr/request_v2.00/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ns.nsdlib.org/ndr/request_v2.00/ http://ns.nsdlib.org/schemas/ndr/request_v2.00.xsd" 
    schemaVersion="1.00.000">
  <title>Middle School Portal (MSP2)</title> 
  <description>The MSP2 collection blah blah</description>
  <state>Active | Inactive</state>                                            optional, default: Active
  <metadataRecordFormat>format_msp2</metadataRecordFormat>                    optional, default: format_nsdl_dc
  <contact email="ginger@ucar.edu">Katy Ginger</contact>                      optional
  <resourceURL>http://msteacher2.org</resourceURL>           optional
  <externalIdentifier source="NCS">NDR-000-000-000-0001</externalIdentifier>  optional
</inputXML>

Josh 22:25, 28 April 2010 (UTC) -- My thoughts:

  • <state>: IMO toss it. What does it mean? Every application (hint: app data) has it's own idea of states. NCS has it's own states, OAI has visibility, WFI has at least 10 states for various things...unless this has a defined meaning at the collection-level (independent of any application) I think it will just cause problems.
  • <metadataRecordFormat>: I thought that collections could contain objects in varying formats. Are we limiting them to one format now? I think this is a bad limitation. If it's supposed to reference the collection metadata itself, I think it's irrelevant. Don't the metadata calls allow you to specify a format? What about child collections?
  • <contact>: I don't like how this is implemented, seems too ambiguous to me, but I don't have an alternative at this point. I think implementing a robust contact/user system is a good idea but too much for August and probably 2.0. Could/should this be inherited from a parent collection?
  • <externalIdentifier>: From what I understood, this was unique to each Agent and was automagically captured and presented behind the scenes based on who was making the calls. E.g. if the NCS added a collection, it specified it's ID and we saved it as a mapping between NCS and the handle. If that's still the case, I don't understand what the source attribute is for.

Current Implementation

Collection Aggregator

I took a look at an existing Collection Aggregator in the NDR. It has a Service Description datastream attached with the following content...


<serviceDescription xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>Aggregator for Beyond Penguins and Polar Bears: An Online Magazine for K-5 Teachers</dc:title>
  <dc:description>Collection of Beyond Penguins and Polar Bears: An Online Magazine for K-5 Teachers items</dc:description>
  <dc:type>Aggregator</dc:type>
  <image>
    <brandURL>http://nsdl.org/images/brands/NSDL-COLLECTION-000-003-111-989.jpg</brandURL>
    <title>Beyond Penguins and Polar Bears: An Online Magazine for K-5 Teachers</title>
    <width>82</width>
    <height>30</height>
    <alttext>Penguins, Polar</alttext>
  </image>
  <contacts>
    <contact>
      <name>Kim Lightle</name>
      <email>klightle@msteacher.org</email>
    </contact>
    <contact>
      <name>Jessica Fries-Gaither</name>
      <email>jfriesgaither@msteacher.org</email>
    </contact>
  </contacts>
</serviceDescription>
    
Collection Metadata Provider

The Metadata Provider for the Collection has a Service Description with minimal differences (marked)...


<serviceDescription xmlns:dc="http://purl.org/dc/elements/1.1/">
  <dc:title>MetadataProvider for Beyond Penguins and Polar Bears: An Online Magazine for K-5 Teachers</dc:title>
  <dc:description>Provides Beyond Penguins and Polar Bears: An Online Magazine for K-5 Teachers records</dc:description>
  <dc:type>MetadataProvider</dc:type>
  <image>
    <brandURL>http://nsdl.org/images/brands/NSDL-COLLECTION-000-003-111-989.jpg</brandURL>
    <title>Beyond Penguins and Polar Bears: An Online Magazine for K-5 Teachers</title>
    <width>82</width>
    <height>30</height>
    <alttext>Penguins, Polar</alttext>
  </image>
  <contacts>
    <contact>
      <name>Kim Lightle</name>
      <email>klightle@msteacher.org</email>
    </contact>
    <contact>
      <name>Jessica Fries-Gaither</name>
      <email>jfriesgaither@msteacher.org</email>
    </contact>
  </contacts>
</serviceDescription>
           

There are 3 additional datastreams in the Metadata Provider for the Collection. These seem to be holding non-NDR metadata about the collection. For example...


The API provides no means to add metadata directly to a Metadata Provider. Would these become app data for a specific MDP or general annotations on the Collection? How to add an appdata annotation to an MDP is TBD.

Ostwald 22:34, 26 March 2010 (UTC) - these are NCS-specific app data (about an NCS collection). This information is place in the NDR so that the NCS can be bootstrapped from the NDR. Other apps will have their own reasons for wanting to store collection-level information in the NDR.

Collection Metadata

The Metadata for the Collection has a Service Description has the following 3 datastreams. These seem to be holding non-NDR metadata about the collection. For example...


How would this metadata become part of the metadata for the collection? The API does not provide a means for adding metadata as part of the addCollection API request. Also, the addMetadataRecord API request does not currently support adding multiple formats of metadata.

Ostwald 22:31, 26 March 2010 (UTC)

  • format_ncs_collect is the collection-level metadata record created by the NCS and used by NSDL.org and Harvest manager.
  • format nsdl_dc is format_ncs_collect transformed to nsdl_dc and to my knowledge not used at all
  • format_dcs_data is NCS-specific meta-metadata and is only relevant to NCS

Comparison between proposed and existing

Notice that title, description, and contact are already part of the service definition.


Options, pass in the title, description, and contact as...

  • serviceDescription xml - in which case the serviceDescription is added as a datastream
  • parameters - serviceDescription XML is generated based on the parameter and added as a datastream
  • parameters - parameters are added as properties for the Aggregator object (This option will require conversion of existing objects.)


In the proposal, where is branding information going to live?


Output Discussions

Processing Discussions

XML Transform of InputXML

As I was exploring the code for ways to implement the processing for collection inputXML, I began to notice the similarities with the type of information in the inputXML for addCollection. That got me wondering if a better approach might be to do a transform on the inputXML to put it into a format that is compatible with 1.0 processing. Here is what inputXML would look like if transformed.

2.0 inputXML  (from specification example)

<?xml version="1.0" encoding="UTF-8"?>
<inputXML 
    xmlns="http://ns.nsdlib.org/ndr/request_v2.00/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ns.nsdlib.org/ndr/request_v2.00/ http://ns.nsdlib.org/schemas/ndr/request_v2.00.xsd" 
    schemaVersion="1.00.000">
  <collection>
    <parentCollection>2200/2010030201T</parentCollection>
    <title>Middle School Portal (MSP2)</title> 
    <description>The MSP2 collection</description>
    <contacts>
      <contact email="ginger@ucar.edu">Katy Ginger</contact>
    </contacts>
    <resourceURL>http://msteacher2.org</resourceURL>
    <externalIdentifier source="NCS">NDR-000-000-000-0001</externalIdentifier>
  </collection>
</inputXML>


Transformed into 1.0 style inputXML

<?xml version="1.0" encoding="UTF-8"?>
<inputXML 
    xmlns="http://ns.nsdlib.org/ndr/request_v2.00/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://ns.nsdlib.org/ndr/request_v2.00/ http://ns.nsdlib.org/schemas/ndr/request_v2.00.xsd" 
    schemaVersion="1.00.000">
  <collection>
    <properties>
      <identifier type="URL">http://msteacher2.org</identifier>
    </properties>
    <relationships>
      <memberOf>2200/2010030201T</memberOf>
    <relationships>
    <data>
      <format_type="ndr_collect">
        <meta>
          <title>Middle School Portal (MSP2)</title> 
          <description>The MSP2 collection</description>
          <contacts>
            <contact email="ginger@ucar.edu">Katy Ginger</contact>
          </contacts>
          <externalIdentifier source="NCS">NDR-000-000-000-0001</externalIdentifier>
        </meta>
        <info>
          <nsdlAboutCategory>gcco</nsdlAboutCategory>
          <repositoryPrimaryIdentifier>http://msteacher2.org</repositoryPrimaryIdentifier>
        </info>
      </format>
    </data>
  </collection>
</inputXML>


This would allow the existing code to function without change. The Collection model would define property, relationship, and datastream components that are compatible with the 1.0 style inputXML and would use those components to create the various objects and relationships.


General Discussions

Implementation

Developer: TBA

Basic description of the process...

NDR Objects

Creates objects in the specified order and fills with...

Collection Resource (Resource)
(optional) Created only if <resourceURL> is specified.
Properties
hasResourceURL xpath: /inputXML/resourceURL Element value used without modification.
Datastreams
N/A
Relationships
N/A
Collection Aggregator (Aggregator)
(required) This object is always created. The handle of this object identifies the Collection composite object.
Properties
N/A
Datastreams
ServiceDescription Currently holds authoritative metadata and branding info. This is not part of the current specification. What impact does this change have?
Relationships
aggregatorFor Owner Agent determined by agent signing the addCollection request
associatedWith Collection Resource (optional) if <resourceURL> was specified, makes the association to the Collection Resource that was created
authorizedToChange (should be isAuthorizedToBeChangedBy) Owner Agent determined by agent signing the addCollection request
memberOf All Collections special aggregator created by big bang to keep track of all collections Does this exist?
Collection Metadata Provider (MetadataProvider)
(required) This object is always created.
Properties
setSpec N/A
setName N/A
visibility Currently used to determine whether all metadata provided by this metadata provider should be surfaced through PUBLIC and/or SEARCH OAI feeds. Does it apply here? It is not currently represented in the specification.
Datastreams
ServiceDescription Currently holds authoritative metadata and branding info. This is not part of the current specification. What impact does this change have?
Relationships
aggregatedBy Collection Aggregator created by this process
authorizedToChange (should be isAuthorizedToBeChangedBy) Owner Agent determined by agent signing the addCollection request
metadataProviderFor Owner Agent determined by agent signing the addCollection request
Collection Metadata (Metadata)
(required) This object is always created.
Properties
itemId Used when this metadata is served out with OAI. Does it apply here? It is not currently represented in the specification.
uniqueId ID that is unique within an MDP, but not necessarily NDR. Currently mostly used by OAI ingest. Does it apply here? It is not currently represented in the specification. How does this relate to externalIdentifier?
visibility Currently used to determine whether a metadata object should be surfaced through PUBLIC and/or SEARCH OAI feeds. Does it apply here? It is not currently represented in the specification.
Datastreams
format_nsdl_dc /InputXML/NDR_Collection (transformed) Is this an automatic process that occurs when the metadata object is created?
format_nsdl_dc_info /InputXML/NDR_Collection (transformed) Is this an automatic process that occurs when the metadata object is created?
NDR_Collection /InputXML/NDR_Collection Element value used without modification.
Relationships
annotates N/A
metadataFor Collection Aggregator created by this process
metadataFor Collection Resource (optional) created by this process if a Collection Resource was created
metadataProvidedBy Collection Metadata Provider created by this process


Indexing in DDS

Construct the following record for putCollection into DDS. This will be done after all objects in the NDR are created.

http://www.dlese.org/dds/services/ddsupdatews1-1?verb=PutCollection&collectionKey=[handle]&xmlFormat=[metadataRecordFormat]&name=[/inputXML/metadata/title]&description=[/inputXML/metadata/description]

where,

Parameter Value Comments
verb PutCollection expected constant value for verb
collectionKey <handle> for Collection Aggregator
xmlFormat metadataRecordFormat All records in this collection must have the same metadataRecordFormat for all metadata that will be registered in DDS. This is not the case in NDR. Metadata records are not required to have the same XML Format for all metadata records within a collection. Is there a way around this in DDS? Or did I misinterpret this?

jw- A DDS collection may contain only a single XML format, no way around it. I thought this is also what we agreed upon for NDR collections. While this may seem restrictive, it simplifies the processing and operations over a collection, since the format is known/consistent. We discussed in our meetings that, for cases where a single group has records in multiple formats, each format could have their own NDR collection. Then, in the nsdl.org UI we could combine these NDR collections into one library collection.

elr- The requirement by DDS that all records be in the same metadata format does not preclude NDR from having records in a collection in different formats. What is required is that there be a transform for all metadata formats to that specified by metadataRecordFormat. See Implementation Details for addMetadataRecord for more information on the usage of putRecord.

NOTE: metadataRecordFormat is currently not one of the parameters passed in for inputXML.

name /inputXML/metadata/title value of element at the specified xpath from inputXML
description /inputXML/metadata/description value of element at the specified xpath from inputXML


This is the proposed XML Format of NDR_Collection. How does this fit with DDS? Is it supposed to be used somewhere in DDS?

jw- When setting up a collection in DDS one would use title, description, xmlFormat (we need that somehow from NDR) and collectionKey derived from the NDR collection handle.

When creating a DDS repository for searching the resource-centric metadata, the xmlFormat could be the output format that is returned from get/listResourceMetadata verbatim, the DDS record being the portion that contains all metadata and annotations about a resource in a single blob. I have to think about this more, but on the surface this would generate a nice resource-centric index in DDS and allow the kind of searches we desire like "give me all records tagged as 'favorite' by joe (from annotation) that have a star rating of 4 or better (from annotations), a grade range of 9 (from metadata), in collection Teach Engineering (inherited in DDS from the collection)."

<?xml version="1.0" encoding="UTF-8"?>
<record 
    xmlns="http://ns.nsdl.org/ncs/ndr_collect" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://ns.nsdl.org/ncs/ndr_collect http://ns.nsdl.org/ncs/ndr_collect/1.00/XXXX.xsd">
  <recordID>NDR-000-000-000-0001</recordID> 
  <collectionID>xxxx</collectionID>
  <status>Active or Inactive</status>
  <date created="2004-11-24T13:45:27Z" modified="2004-11-24T13:45:27Z"/>
  <url>http://msteacher2.org</url>
  <title>Middle School Portal (MSP2)</title> 
  <description>The MSP2 collection blah blah</description>
  <contact email="ginger@ucar.edu">Katy Ginger</contact>
</record>


Links

API Links:

Additional Links Related to This Call:

Personal tools