TNS Internal:CollectionAPI/AtomPub

From NSDLWiki

This is a proposal for a collection-oriented interface to based on the Atom and AtomPub standards. The goal is to make the collections and items of the repository discoverable in a simple manner while using established standards and conventions.

Overview

This Atom/AtomPub based interface is composed of three types of web resources:

AtomPub Service documents
- Lists all collections housed in the repository, with links to their feeds
- Groups collections into workspaces
- Lists the capabilities of the collection, including acceptable member categories and mime types. A vocabulary of categories is proposed to represent type, source, and format of entries.
Atom feed documents
- Each feed document is a web resource with a resolvable URI.
- clients may add or remove entries from feeds using techniques defined in the AtomPub protocol
Atom entries
- Each entry contains a record or media file as its content.
- Entries contain a number of links to related resources, such as the atom feed containing member items (in the case of an entry that represents a collection record), out-of-band data an application needs to attach to the record, or the resource being described in record metadata.
- Each entry has a resolvable URI, and may be edited per the AtomPub protocol

Basic model

The entire repository record space is represented as a series of atom feeds and entries. An entry serves as a container for its content, which may be either a metadata record or a (possibly binary) media file. Besides the content, the collection API makes extensive use of atom links to relay information about the content. The basic elements of an entry are as follows:

id

Globally unique identifier required by atom spec. The server may choose generate its own id for an entry (if none is given), or use an acceptable value provided by the client. The convention used in the collection api is that it is a uri with the prefix http://ndr.nsdl.org/collections/.

title

Required by the atom spec. For metadata-based entries, this should be the catalogued title. For media files, this should be a a file name.

updated

The server will maintain the last updated date for each record

content

The content of the entry. Contains a href link to a web resource containing the content (metadata record or media file)

summary

required by the atom spec if content is not human readable text or html. This can be anything, but typically should be a description from metadata, or a short description of a media file. This field is solely for the benefit of humans scanning the entries.

category

Optional under the atom spec, but required by the collections API. A record may have multiple categories. The collections API defines categories to reflect the entry type, source, or metadata format. See vocabulary of categories

link

The collections api makes extensive use of atom links. Each link has a relattribute. The collections api uses standard link relations as follows:

rel=via link from an metadata item entry to the collection record entry for the collection(s) of which it is a member
rel=related link to other non-record content. Applications can attach their own 'private' data to any entry using this link. There may be any such links, and they are differentiated by their title
rel=alternate For entries containg metadata records, this contains the URI of the resource being described by the record (typically, this is the URL of some site on the web). There may be only one such link.
rel=enclosure link to the media file, or all forms/formats of metadata associated with a record. There must be at least one enclosure link, pointing to the URI of the content of the entry. There may be any number of enclosure links.
rel=edit,rel=edit-media defined by the AtomPub spec, used for modifying the content of the atom entry, or the metadata record or media file itself.

Entries must be in one of three type categories: record, collection, or media.

record entry: A record entry contains a metadata record as its content. This record is considered its "primary" record. It may, optionally, contain any number of enclosure links that contain alternate formats of the record. Clients may or may not understand each format. Since every record is a member of a collection, all record entries must contain a rel=via link to the entry describing its collection metadata.
collection entry: A collection atom entry is a record entry that contains, as its content, a collection metadata record. Every collection has an associated atom feed containing the members of a collection (which is created by the system). A collection atom entry, therefore, contains a rel=related title="feed" link to its item feed. [Josh: Wouldn't it be better to use title="item_feed" or title="items" given that "feed" could mean an RSS URL and apply to multiple links?]
media entry: A media entry contains a file as its content. Like record, it too must have a rel=via" link to the collection it belongs to.

Examples

Sometimes, these concepts are best understood through example. Here are some examples of feeds that comprise this collection API. Look at the feeds in various browsers and feed readers. (note: service documents are not feeds, and are likely to be unrecognized by most readers)

Service Document (see detailed walkthrough)
- Contains ROOT workspace listing the "collections of collections" hosted in this repository
- "NSDL collections" workspace represents the collections comprising nsdl.org. Note the usage of categories
- "NSDL media" workspace represents a collection of collections containing nsdl.org media. Its only member is the brand images collection, which contains media entries.
Feed containing collection records (see detailed walkthrough)
- Represents NSDL collection of collections
- Note how each entry contains a link to the collection feed
Feed containing nsdl_dc metadata records (see detailed walkthrough)
- Note that this feed actually has SIX members, but is split into two feeds. Atom pagination is achieved via navigation links ("rel=next"). Most readers only read the first feed page.
- Just contains nsdl_dc, except one record that contains an additional enclosure containing 'native' nsdl_dc (i.e. the raw, un-transformed nsdl_dc harvested from the source)
Feed containing metadata records from multiple sources, in different formats (see detailed walkthrough)
- Contains records from NCS in msp2 format, and from Web feed ingest in nsdl_dc format
- Contains "out of band' dcs_data link for app-specific data associated with record
- Contains raw feed data from WFI as a non-primary enclosure
Feed containing media: brand images! (see detailed walkthrough)
- Pure image media, No real cataloging associated with them.

Reading from the NDR

Feeds

Feeds are the primary means for reading data out of the ndr using the collection API. All feeds have the following properties:

Stable URIs: A feed URI never changes, unless it is deleted/removed. If an app wanted to monitor the items in a single collection, it would be appropriate to configure the app with the collection's URI.
Entries ordered by last update date: As per the atom spec, feeds ordered such that the most recently updated/added entry appears first, and the oldest entry is last
Pagination for large feeds: Large feeds are broken up into a series of feed documents. If the feed has been broken up as such, the atom pagination convention is used such that each feed document contains rel=next, rel=prev, rel=first or rel=last links in the feed preamble. To facilitate use in browser (live bookmarks), the first feed document may be much shorter than subsequent entries (say, 25 entries)
Access restrictions: Feeds MAY require authentication or authorization in order to read its content.

There is no real difference between collection feeds and collection member feeds except that a collection feed contains a link to a service document in which it appears.

Entries

As mentioned earlier, all entries have their own URI, and exist as self-contained web resources outside the context of their feed. This is useful for JMS messages. If an object is updated in the NDR, it shall be possible retrieve the corresponding entry resource.

Writing to the NDR

Writing to the NDR via the collections API uses techniques specified in the AtomPub protocol, plus a few extensions for some of the more "advanced" features (such as creating "out of band" content).

Adding new items/collections

There are two possible ways to add new items, using AtomPub principles

POST an atom entry to a feed

This atom entry is essentially the same entry that will be added to the feed. The server is allowed to reject or make changes to the posted entry before storing and adding to the feed as follows:

id If an id is provided, the server will use it, or return an error. If an id is not provided, the server will creat one
updated The server will ignore any value provided by the user, and will create its own
content Any xml content between the content tags will be used to create a new web resource containing that content. The resulting contentelement will contain a link to the URL of this content.
link As for the content element, any atom links that don't specify a href, but instead contain xml content, will result in a resource being created and linked to. The server may add new links as well. In the case of an item entry, the server will create a link to its collection (and overwrite any link provided by the user to this effect).
Other required elements (title, summary, etc) - these should really be defined by the app, but the server could potentially fill them with some sort of reasonable content.

As per AtomPub, the response headers contain the location of the newly created entry.

POST just the content to the feed: AtomPub allows this as well. In this situation, since ONLY the content (file) is POSTed to the feed, the server creates. We would need to decide if we want to allow it. This may be useful for media collections (images, etc), where the title, summary, etc are less relevant than a collection or item. The server would need to generate an entire atom entry itself, and link to the provided content.

Editing items or collections

There are two legitimate operations that fall in this scope: updating the content of an entry, or updating the atom entry itself (such as attaching new content)

Updating the content of an entry: The AtomPub protocol defines a link rel="edit-media" atom link, which points to a a web resource that can be updated. As per AtomPub, this resource can be updated with PUT containing the new content. This affects the content element. In the collections API, we also declare that any other atom link may be updated in a similar way. The linked resources may be updated by a PUT. Any such resource may require authentication, and may return any http error code.

Updating the entry itself: The AtomPub protocol defines a link rel="edit" link to a resource representing the individual atom entry. This can be updated with PUT. As this replaces the existing entry with the supplied entry (plus any changes created by the server), applications should pass along all elements in the original entry unless it is explicitly removing an element. It is necessary to use this method if the application wishes to add a NEW rel=related link to an entry.; This method may also be used (as an extension of the AtomPub protocol) to update the content of the entry, or any link. Using the same method described earlier, if xml is found between content or link elements, the server will create or update the linked resource with the given content.

Usage in terms of use cases

Web Feed Ingest

List collections with titles

Given the URL of the feed of NSDL collections, iterate through atom entries. Their title is in the <title> element, and their feed url is in a <link rel="related" title="feed"> element.

Attach source of records (WFI) to a collection

Explicitly enabling/defining/attaching a source of records is perhaps a need dictated by policy and security. Compare the following procedures:

For any items added to a collection via WFI, simply declare <category scheme="http://ns.nsdl.org/collections/source" term="WebFeedIngest" /> on an ad-hoc basis.
Use the same <category> tag, but before a collection will accept such an entry, the WFI source will need to be listed in the collection's ncs_collect record. Thus, edit the collection record to declare the source, then start adding records at will. Applications that have permission to edit a particular collection would be able to edit *all* records in a collection regardless of source. The source category would merely be a tool in allowing an app to select only its own records.
Use a MetadataProvider handle rather than a human readable string as the source. NCS (or another admin app) would create a new source MetadataProvider for the collection, and assign permissions to the source app. This would assure that one source app would not be able to edit the records of another source app within a collection.

Store app-specific config data with collection or records: App-specific data associated with a collection would be located in that collection's collection record atom entry. The app would add a <link rel="related" title="X"> link, where the title X is chosen by the app. (e.g. see the title=dcs_data link in this entry). For data associated with an item record, the same technique would apply, though the link would appear in the item record's atom entry.

There are two ways to introduce a new <link rel="related" title="X"> link to the atom record. As per the AtomPub protocol, both involve an http PUT to the resource in the entry's <link rel="edit"> link. The payload of this PUT request is the full content of the atom entry plus the new link. The server shall accept two different representations of the new link

A standard atom link, complete with a href url. In this case, the app would have put its content at the given url, and merely references it in the atom link.
The app creates the atom link, but does NOT include a href attribute. Instead, the atom link element contains nested elements containing xml content. The server will create a new web resource, give it the xml content found between the link tags, and save the atom entry with the new atom link pointing to the new url containing the requested content. This behaviour is NOT part of the atomPub spec.

Determine all records in a collection that were supplied by WFI

Iterate through all records in a collection feed, and discard all those that do not have a <category scheme="http://ns.nsdl.org/collections/source" category for the specified app.

We may or may not want to expose a way to filter the results, so the feed contains only the desired entries. This may be easy or difficult depending on how the feeds are implemented.

NCS

De-accession collections To completely purge a collection and its members from the repository http DELETE to the collection atom entry uri (found in the rel="edit" link). The system will purge all records in the collection feed, then purge the collection atom entry (though not necessarily in that order) To de-activate a collection but NOT purge its content from the repository Edit the atom entry for the collection, add the element <app:draft>yes</app:draft>. At that point, we have two possible choices for system response: The system modifies every member record to add <app:draft>yes</app:draft>. Applications and indexes will then respect this marker, and disregard the entries The system will make the membership feed inaccessible (perhaps a 403 response) to all clients except those who have permission to see the feed

Store collections in NDR for preservation that are not yet visible

Same use of <app:draft>yes</app:draft> as above. There would be no distinction as to WHY the collection is not visible (e.g. not ready vs de-accessioned but in repository), except perhaps in the collection metadata record.

Add new collections

Create an entry in a collection of collections feed (e.g. NSDL collections) as per AtomPub. This could be done in two ways:

POST the full collection atom entry to the collection feed uri. The primary collection metadata would be xml nested in the content element. The collection members feed will be added by the server and will appear in the stored atom entry, and the uri of the new entry will be returned to the client as per AtomPub.
POST just the collection metadata to the feed uri. An accept element in the service document for the collection of collections will let the client know that is possible (e.g. the mimetype of the collection metadata is acceptable). The system will take care of creating the atom entry, and he client would be returned the uri of the new atom entry, as per AtomPub. If the app needs to add additional info, it would need to subsequently edit the atom entry.

Add new records: Same as adding new collections, except POSTing to the individual collection's feed.
Load collections/items from repository: Given a collection-of-collections feed uri (or discovering it through the service document), or an individual collection feed (if items are desired rather than collection records), iterate through the atom feed. NCS may have to (or want to) discard records that do not have NCS as a source, and perhaps pick a single metadata format to present to the user for editing.
Store app-specific out-of-band data associated with collections or items: see explanation in Web feed ingest example.
Modify collection metadata or app-specific data: As per the AtomPub spec, editing collection metadata is achieved by a PUT of the new metadata document to the edit-media link in the collection's atom entry. If the NCS needs to update non-primary metadata (say, nsdl_dc), it can do a PUT to the uri in the <link rel="enclosure" title="format_nsdl_dc"> link. Any other link (including out of band rel="related" links can be edited in the same fashion. For creating new non-primary metadata enclosures or out of band related links, see the Web feed ingest example above.
Manage non-NSDL collections in repository: Nothing special is needed. Non-nsdl collections would simply exist in some collection-of-collections feed that is NOT the "NSDL collection of collections" feed.

DDS

Discover collections: Within the scope of a collection-of-collections (e.g. discovering new NSDL collections), new collections are discovered merely by detecting new entries in the collection-of-collections feed.
Load records in a give collection: The collection record entry will have an atom link to the feed that contains all records in a collection: <link rel="related" title="feed" type="application/atom+xml;type=feed">
Determine the collections for a given record: All item records have a link back to the parent collection atom entry, using the atom rel="via" link.
Determine the native metadata format for a given record: The native metadata record is in the content element of the atom entry.
Read specific metadata formats from a given record: All resources considered to be metadata records are present in atom rel="enclosure" links, even the primary metadata record. There is a naming convention to use the titleparameter to convey the metadata format's name.

Collections reporting

How many collections use NCS? OAI harvest? WFI?: If the source is recorded in collection metadata, this is job for a search index. If not, this can be achieved by iterating through the desired set of records and counting the source categories,
What format(s) are used in a given collection?: If the format is recorded in collection metadata, this is job for a search index. If not, this can be achieved by iterating through the desired set of records, and counting the format categories. If this is to include all translations and alternate formats, then all the enclosure link titles will need to be counted as well.
How many items are in a collection? Historical data?: Feeds could list the number of items in them, but this proposal does not include that at this time. There is no explicit means of presenting historic data - this proposal focuses entirely on present repository state.

TNS Internal:CollectionAPI/AtomPub

From NSDLWiki

Contents

Overview

Basic model

Examples

Reading from the NDR

Feeds

Entries

Writing to the NDR

Adding new items/collections

Editing items or collections

Usage in terms of use cases

Web Feed Ingest

NCS

DDS

Collections reporting

Categories

Views

Personal tools

Navigation

Wiki Search

Toolbox