TNS Internal:CollectionAPI/GeneralAPIDesignIssues

From NSDLWiki

Jump to: navigation, search

Issues relating to the Development of a Comprehensive API



Doubt About the Efficacy of the Current API Proposal

There are a number of concerns with how the API is laid out including problems with data duplication, passing in similar and possibly identical arguments, assumed values, etc.. Much of the doubt stems from a disconnect in conceptualization of the model(s) driving these methods.

---Jweather 22:51, 18 December 2009 (UTC)
It would be useful to articulate the concerns with the current API proposal. Specifically where there is data duplication, passing in of identical arguments and assumed values, etc. and, if possible, how these constructs might be changed, streamlined or improved.

The API proposal has many features, perhaps more than are necessary. As I understand it, the core features that we desire are: 1. To model collections explicitly; 2. To provide explicit support for native metadata formats; 3. To support annotations.

I'm not wedded to the current API proposal, but I think it provides a framework to support these features (plus a few more that perhaps are not necessary).

Conflicting Data Models

The NCS or DLESE model and the NDR model are not the same. They have some fundamental differences, some conflicting and some complementary. Questions arising from this are...

  • What is a collection?
  • What is metadata?
  • What is a resource?
  • Is a resource a record, or metadata?
  • Is an annotation separate, or metadata, or a subsetof/inherited metadata object?
  • How does DDS fit in?
  • What goes into DDS versus NDR?
  • How are they updated to be in sync?
  • Can DDS do X which the NDR does?

It seems that neither the developers in CO nor the developers in NY completely understands the other's model. There are questions about how DDS works or fits into the grand view of the infrastructure. These conceptual differences must be sorted out as a prerequisite to designing an API that will meet the needs of the broader community.

--Ostwald 19:50, 18 December 2009 (UTC)
The notion that there are only two data models involved may be an over-simplification. I would argue that few applications (Expert Voices being one) actually make use of the "NDR model" and that nearly each application uses the NDR in an ad hoc manner.

At the collection level, the NCS's model has come to dominate because it is the application that was chosen to fill the need of collections management. The NCS implements a collection model based on OAI and Search requirements.

OAI and Search base their notion of collection on the MetadataProvider abstraction (ignoring the Aggregator completely?).

Webfeed Ingest, in turn, discovered that "find all Aggregators that are members of the collection of collections" didn't work as expected.

As I understand things, the Collection API was motivated by the above situation, and is intended to provide an explicit and shared interface for basic collection-level services (such as "list collections"), so that apps don't need to interpret an abstract (and complex) definition of collection or, to worry about satisfying the idiosyncratic needs of other apps.

Use Cases vs. Abstracted Concepts

Use cases can provide a useful analysis for identifying common abstract concepts that need to be considered in the overall design of a system. At the same time, catering to specific use cases by designing method parameters to fit the needs of a particular use case must be considered very carefully. This can lead to limitations in the functionality of the design that can make it difficult for new applications to adopt the API.

--Ostwald 19:50, 18 December 2009 (UTC)
I completely agree with the above. To exploit the positive aspects of use cases, we were given the assignment of looking at the proposed API with our respective applications in mind. Unfortunately, only John posted this activity to the wiki, so it may appear that the DDS was overly influential in the API.

Keeping in mind the danger of catering to specific use cases, we could all think about the proposed API in terms of the NDR applications we know best, and use this thinking as a way to make our understandings and questions explicit. Then we talk about abstract concepts and functionality we will have a greater shared understanding.

More time needs to be dedicated to looking forward and thinking abstractly about what the concepts and functionality to be supported and the best way to surface these in general terms, not in application-specific terms.

For example, if it is determined that star ratings are desirable, instead of thinking how do they fit into a particular application, it should be determined how they fit into our model in relation to the other objects and their relationships? What does one method get us versus another? Then, after that's decided, how do we code that functionality into the API to surface the new concept in a way that facilitates the adoption of the new functionality by applications?

Unified Data Model

Problems such as those described will continue to persist until the underlying conceptual differences between the two models currently used to drive NSDL related activities have been resolved. Now is the time to develop an unifying data model and base the proposed API on that model. This will provide a cohesive set of concepts and a stable and predictable API that meets the requirements of our current applications and also provide the basis for serving the needs of future applications.

Level of API Granularity

Currently, there is an object API that provides access to specific object types in the NDR. The model separates information in a way that may not be intuitive to broader audiences, such as the separation of resources from metadata. The toolkit currently provides essentially a 1:1 mapping to the NDR model. In general, this has been seen as a mid to low-level interface to the NDR data: a step above the structural concerns of the Fedora interface and data model, but not encapsulating higher-order concepts such as collections.

It is the java toolkit level API that the current proposal is designed to expand. Currently, Aaron is working within the toolkit framework to provide a set of high-level classes (such a Collection object) that will comprise a library useful in implementing a collections web service API.

The existing object API is accessible through a URL and returns XML in an RPC-like style. When the java toolkit was created, it was designed to also operate on XML inputs and outputs, but provide slightly more object-oriented access to the java developer. As mentioned before, there is mostly a 1:1 correspondence between the model, web API, and toolkit. See Model Javadocs for the java representation

listMembers/<metadataprovider_handle> Results<Metadata> = provider.getMembers();
get/<metadata_handle> Metadata m = ndr.getObject(handle, Type.Metadata);

In the initial toolkit java library, a high level Collection object/interface may be exposed in the java API, but the actual implementation will not map to a single Web/RPC call, or to a single NDR object. The NDR api itself may be extended at a later point to represent such concepts directly, should it prove useful to do so.

Personal tools