TNS Internal:CollectionAPI

From NSDLWiki

Jump to: navigation, search

Using and managing collections through the NDR is difficult, and is a barrier to further development of applications. Contributing factors are as follows:

NDR API presents an interface that is too low level
Collections are defined as data structures composed of multiple NDR objects and relationships. The NDR API operates on a per-object basis. Interacting with a multi-member structure inherently involves graph traversal, and multiple API operations. As a result, divining data from these structures becomes a slow and complex set of procedures.
Collection representation is open to (incorrect) interpretation by applications
The flexibility of the NDR model is in many ways a double-edged sword. Collection data in the NDR is represented in datastreams, relationships, properties, and paths in the NDR graph. In order to retrieve data from a collection, the application must know a priori exactly how and where that information is stored. In practice, applications have not been able to perform this task consistently with respect to one another
NSDL.org infrastructure is running the show
As it stands now, the only practical criteria for a record to be "in the library" is to be served out via OAI, and searchable on nsdl.org. The state of a collection in the stored in the repository has little bearing on its visibility via OAI and search.
Collection authority is ambiguous.
NCS keeps a list of collections that it views as being 'in the library'. An application viewing collections in the NDR might draw different conclusions. It seems that the NDR should be the definitive store, yet the NDR place where retrieving collection info is the most difficult. Thus, the NCS has is serving as the de-facto list of NDR collections.

If the repository is to serve a useful role in using and managing collections, it seems that it needs a higher-level collection-oriented API. The following sections contain use cases, required functionality, and proposals for such an API that meet the stated needs.

Contents

[hide]

Required functionality

  • Add new collections
  • Modify curated collection data (ncs_collect)
  • Remove (purge entirely) collections
  • De-activate, but to not remove, collectoins
  • Add. Modify, Delete items
  • Apparently atomic operations
  • Read/write "out of band" application-specific data for a given item or collection
  • Collections in repository that are non-"NSDL" collections
  • Store (Metadata) records in their native format
  • Store multiple (Metadata) formats for each record - while keeping it apparent which format is native
  • Discover collections
  • Retrieve all records in a given collection
  • Determine the collection(s) that a given record is a member of
  • Support multiple sources of records for a given collection, with the ability to add a source at will
  • Able to implement read or write restrictions on records
  • Able to store primary content (binary resources) somehow
  • Support annotation metadata
  • Support aggregations of collection, i.e. collection A is made of collections B,C,D

Some Use cases

Web Feed Ingest
  • List collections with titles,
  • Attach source of records (WFI) to a collection
  • Store app-specific config data with collection or records
  • Determine all records in a collection that were supplied by WFI
NCS
  • De-accession collections
  • Store collections in NDR for preservation that are not yet visible
  • Add new collections
  • Load collections from repository
  • Load items from repository
  • Store app-specific out-of-band data associated with collections or items
  • Modify collection metadata or app-specific data
  • Manage non-NSDL collections in repository

see this page in the UCAR Swiki for more detail about how the NCS manages collections in the NDR.

DDS
  • Discover collections
  • Load records in a given collection
  • Determine the collections for a given record
  • Determine the native metadata format for a given record
  • Read specific metadata formats from a given record
Collections reporting
  • How many collections use NCS? OAI harvest? WFI?
  • What format(s) are used in a given collection?
  • How many items are in a collection? Historical data?
  • Label and aggregate collections

Proposed solutions

Feel free to propose a solution!! Add a link below

  • Abstract API - Methods and parameters that could be implemented as a web service API, java API, etc.
  • AtomPub-based API - Managing collections, items, and data as feeds using AtomPub. Aaron was thinking about this on the plane ride back from Boulder, and thought it would be a great concrete proposal to start thinking about.


Possible Hybrid Approaches

XML API Atom Pub API Java Toolkit
Extend XML API to directly include the concept of Collections
  • All current XML API calls will continue to exist.
  • New XML API calls will be added to directly support collections.
  • Atom Pub API could be written as a simple one-for-one wrapper over the XML API.
  • Java Toolkit supports existing and new XML API calls.
  • Java Toolkit supports mirror calls of those supported for XML API to return to user information in the Atom Pub API format.

Advantages:

  • Existing applications will not need to be retrofit.
  • New code in existing applications can take advantage of functionality in either API.
  • XML API seen as a low level API will exist for NDR savvy applications to make direct object manipulations.
  • Atom Pub API provides the user-friendly high-level view of NDR objects.

Disadvantages:

  • Perpetuates an API that is directly tied to the implementation. Any changes in the underlying model will require changes to the XML API and any toolkits that are based on it.
Atom Pub API becomes preferred API
  • All current XML API calls will continue to exist.
  • No new XML API calls will be defined.
  • New functionality will be created with the Atom Pub API.
  • For existing XML API calls, a simple one-for-one wrapper would be written to support Atom Pub API.
  • Java Toolkit continues to support existing XML API calls.
  • Java Toolkit extended to support new functionality with Atom Pub API.
  • Java Toolkit supports mirror calls of those supported for XML API to return to user information in the Atom Pub API format.

Advantages:

  • Existing applications will not need to be retrofit.
  • New code in existing applications can take advantage of functionality in either API.
  • Atom Pub API provides the user-friendly high-level view of NDR objects.
  • Atom Pub API becomes the default preferred API.

Disadvantages:

  • Has the same perpetuation of an API that is directly tied to the implementation, but provides a path for potentially deprecating the XML API.
Replace XML API with Atom Pub API
  • XML API becomes deprecated.
  • Atom Pub API would be implemented for all functionality. It becomes the only API.
  • Java Toolkit will be re-written to perform all functionality through the Atom Pub API.

Advantages:

  • Atom Pub API provides the user-friendly high-level view of NDR objects.
  • Removes knowledge of model low level implementation from the public NDR API, which is now the Atom Pub API.

Disadvantages:

  • Existing applications will need to be retrofit. This may effect external organizations that have already adopted the NDR API through EduPak downloads.
Personal tools