TNS Internal:CollectionAPI
From NSDLWiki
Using and managing collections through the NDR is difficult, and is a barrier to further development of applications. Contributing factors are as follows:
- NDR API presents an interface that is too low level
- Collections are defined as data structures composed of multiple NDR objects and relationships. The NDR API operates on a per-object basis. Interacting with a multi-member structure inherently involves graph traversal, and multiple API operations. As a result, divining data from these structures becomes a slow and complex set of procedures.
- Collection representation is open to (incorrect) interpretation by applications
- The flexibility of the NDR model is in many ways a double-edged sword. Collection data in the NDR is represented in datastreams, relationships, properties, and paths in the NDR graph. In order to retrieve data from a collection, the application must know a priori exactly how and where that information is stored. In practice, applications have not been able to perform this task consistently with respect to one another
- NSDL.org infrastructure is running the show
- As it stands now, the only practical criteria for a record to be "in the library" is to be served out via OAI, and searchable on nsdl.org. The state of a collection in the stored in the repository has little bearing on its visibility via OAI and search.
- Collection authority is ambiguous.
- NCS keeps a list of collections that it views as being 'in the library'. An application viewing collections in the NDR might draw different conclusions. It seems that the NDR should be the definitive store, yet the NDR place where retrieving collection info is the most difficult. Thus, the NCS has is serving as the de-facto list of NDR collections.
If the repository is to serve a useful role in using and managing collections, it seems that it needs a higher-level collection-oriented API. The following sections contain use cases, required functionality, and proposals for such an API that meet the stated needs.
Contents[hide] |
Required functionality
- Add new collections
- Modify curated collection data (ncs_collect)
- Remove (purge entirely) collections
- De-activate, but to not remove, collectoins
- Add. Modify, Delete items
- Apparently atomic operations
- Read/write "out of band" application-specific data for a given item or collection
- Collections in repository that are non-"NSDL" collections
- Store (Metadata) records in their native format
- Store multiple (Metadata) formats for each record - while keeping it apparent which format is native
- Discover collections
- Retrieve all records in a given collection
- Determine the collection(s) that a given record is a member of
- Support multiple sources of records for a given collection, with the ability to add a source at will
- Able to implement read or write restrictions on records
- Able to store primary content (binary resources) somehow
- Support annotation metadata
- Support aggregations of collection, i.e. collection A is made of collections B,C,D
Some Use cases
- Web Feed Ingest
- List collections with titles,
- Attach source of records (WFI) to a collection
- Store app-specific config data with collection or records
- Determine all records in a collection that were supplied by WFI
- NCS
- De-accession collections
- Store collections in NDR for preservation that are not yet visible
- Add new collections
- Load collections from repository
- Load items from repository
- Store app-specific out-of-band data associated with collections or items
- Modify collection metadata or app-specific data
- Manage non-NSDL collections in repository
see this page in the UCAR Swiki for more detail about how the NCS manages collections in the NDR.
- DDS
- Discover collections
- Load records in a given collection
- Determine the collections for a given record
- Determine the native metadata format for a given record
- Read specific metadata formats from a given record
- Collections reporting
- How many collections use NCS? OAI harvest? WFI?
- What format(s) are used in a given collection?
- How many items are in a collection? Historical data?
- Label and aggregate collections
Proposed solutions
Feel free to propose a solution!! Add a link below
- Abstract API - Methods and parameters that could be implemented as a web service API, java API, etc.
- AtomPub-based API - Managing collections, items, and data as feeds using AtomPub. Aaron was thinking about this on the plane ride back from Boulder, and thought it would be a great concrete proposal to start thinking about.
Possible Hybrid Approaches
XML API | Atom Pub API | Java Toolkit |
Extend XML API to directly include the concept of Collections | ||
|
|
|
Advantages:
Disadvantages:
| ||
Atom Pub API becomes preferred API | ||
|
|
|
Advantages:
Disadvantages:
| ||
Replace XML API with Atom Pub API | ||
|
|
|
Advantages:
Disadvantages:
|