TNS Internal:Users/Endries/Annotations

From NSDLWiki

Jump to: navigation, search

First, I will discuss some shortcomings (IMO) with the "composite annotations" framework proposed by Katy, and then suggest an alternative way of organizing annotations.


Contents

[hide]

Composite Annotations

One large, single annotation

First, this method is not what was discussed or apparently agreed upon. Furthermore it is actually an idea that is the complete opposite of what we talked about in the meetings. Katy was not at these meetings so she didn't know what was discussed, but this is a significant change and needs to be considered carefully and discussed, as was the multiple-annotation method which was talked about at the meeting.


The API (or something) will have to parse out pieces of the input and divine various aspects of the API call, including:

  • Which part or parts of the annotation the application updated.
  • Which part or parts were left out versus intended to be deleted (or a policy will need to be specified to delete everything that is left out).
  • Whether items within an element associate with that element's context, or with a parent context (e.g.: an identifier).

This will make the back-end processing more involved, complex and processor-intensive. It also does not account for annotation ownership and makes the error conditions more complicated. For example, if an application updates an annotation and includes multiple changes, if one of those changes is not allowed does the entire call error out? Does it roll back the other changes if they were applied? The calling application will need to handle another layer of error conditions and include additional logic. I think this obfuscates what otherwise is a straightforward and easy-to-implement solution on both the back\- and front-end with little, if any, real benefit.

Mixing domains

There are some concepts in the documents which aren't clear. The concept of "active/inactive", for example, needs a context. Active for what? What does changing this setting affect?

A collection URL doesn't make much sense to me, but as long as it's optional I don't have a problem with it. If a collection is a generic group of objects, it doesn't necessarily have an overarching URL. If it's a group of resources cataloged by an organization, it _may_ have a URL -- but is that URL really associated with this particular collection or the organization (metadata provider) that cataloged it? Still, the case may come up where a collection of objects, resources, or anything may have a URL for some reason, so while I don't understand the current reason, I don't think it needs to be removed.

There are some cases of application-specific data within the collection container(s).

Singular Annotations

I think it is important to expose annotations as individual resources rather than hide this fact behind XML. There are important aspects of annotations, such as ownership, multiplicity and existence as a conceptual entity, that I think are important.

The fact that a particular annotation came from a certain application may be important. Another application may want to only look at or view an annotation or group of a certain type of annotations that were created by a particular application. For example, if an application creates annotations linking resources with the DHS's terrorist threat level, a separate "how dangerous is the NSDL" application may want to only consider data from this application. Therefore this relationship must be exposed.

There may be multiple annotations of the same type, or this may be disallowed. An entity in the library may have multiple annotations and metadata associated with it, and it is possible that these could have similar data or the same type. The fact that one is metadata and one is an annotation -- different concepts with different purposes -- should be exposed. For example, if there exists a <description></description> element for an object, and an application creates it's own annotation called "description", we will need a way to distinguish between the two in the input and output. Wrapping the annotation in an <annotation></annotation> element, for instance, is a straightforward, simply and logical method to do this.

A common thread underlying these, and perhaps all, reasons for using singular annotations in the API calls is that of portraying the conceptual separation of annotations from other concepts like metadata. If we have two concepts which are very similar but are supposed to be used for and represent different things, in no way should they be presented in a similar manner. Doing so will cause confusion and potentially negative consequences if an application needs to re-implement the objects later on as annotations for interoperability or other unforeseen reasons. If we have two concepts that are to be for different purposes, are implemented differently, and carry different implications, they should be identified and presented as different concepts in the API itself.

Sample XML

For these examples I will use a generic object in the NDR:

<object handle="..." identifier="...">
  ...
</object>

Since an object may have zero or more annotations, it makes sense to include an element which conveys this state.

<object handle="..." identifier="...">
  ...
  <annotations>
  </annotations>
  ...
</object>

Each annotation has some intrinsic properties. All annotations have a handle, data format, creating application and date. If we implement the concept of identifier mapping with annotations, similar to resources, we could have that attribute also. These are all properties of the annotation itself:

<object handle="..." identifier="...">
  ...
  <annotations>
    <annotation handle="..." format="..." creatorHandle="..." creationDate="..." [identifier="..."]>
      ...
    </annotation>
  </annotations>
  ...
</object>

Inside the <annotation></annotation> element is the annotation's data, which must conform to the schema specified in the format attribute. Multiple annotations would be presented as one would expect:

<object handle="..." identifier="...">
  ...  <annotations>
    <annotation handle="..." format="..." creatorHandle="..." creationDate="..." [identifier="..."]>
      ...
    </annotation>
    <annotation handle="..." format="..." creatorHandle="..." creationDate="..." [identifier="..."]>
      ...
    </annotation>
    <annotation handle="..." format="..." creatorHandle="..." creationDate="..." [identifier="..."]>
      ...
    </annotation>
    ...  </annotations>
  ...
</object>

That's it\!

Well, not quite. The magic really happens within the annotation. I think having one small, hopefully simple schema per annotation makes sense versus one large schema that defines all annotations. One reason is because any one annotation format may undergo changes at different paces and times, yet they would all affect the revision of the parent schema if they were all in one. Using one small schema per annotation lessens the changes and number jumps over time, which could lead to confusion. When one is updating their software and sees a large version number change, or a change when one wasn't expecting it, or a change that doesn't affect anything one is working on, it often makes one more concerned or, at least, is inconvenient, additional overhead. If the schema is just for the annotation, it's simpler and clearer. The number of schema in total will rise, but the overhead won't. Whether you are changing one large schema or many small ones, you still need to make the same number of changes. The difference is that the changes affect smaller, more localized applications and people--generally have smaller impact, which I think is a better design.

Every application that stores it's data would do so in a "general" format annotation, which is not guaranteed (by NSDL) to contain any certain elements or organization. In other worse, developer beware of using another application's general annotation. The general annotation schema is pretty simple and just allows anything, for example:

<annotation format="nsdl_annotation_general" ...>
  blah blah blah blah
</annotation>

When one application's concept becomes useful or adopted enough, it could me "promoted" to an official, interoperable NSDL-supported feature. Alternately, a feature could simply be implemented this way from the start--basically top-down vs. bottom-up feature creation. This means the concept gets its own annotation format which is guaranteed to have certain data in a certain format, just like the "general" feature/concept:

<annotation format="nsdl_annotation_general" ...>
  blah blah blah blah
</annotation>
<annotation format="nsdl_annotation_star_rating" ...>
  <minimum>0</minumum>
  <maximum>10</maximum>
  <current>5</current>
</annotation>
<annotation format="nsdl_annotation_education_standard" idType="url" authority="ASN" alignment="High" url="http://purl.org/ASN/resources/S102A81E" .../>
<annotation format="nsdl_annotation_icon" url="http://nsdl.org/thing.gif" .../>
<annotation format="nsdl_annotation_review" ...>
  <reviewer name="..."/>
  <title>...</title>
  <content>...</content>
</annotation>
<annotation format="nsdl_annotation_grade_level" minumum="3" maximum="8" .../>
<annotation format="general" creatorHandle="WFI" ...>
  <ingestedOn>12345</ingestedOn>
  <feedURL>http://...</feedURL>
  <schedule>daily></schedule>
  <testThingy>8</testThingy>
</annotation>
<annotation format="general" creatorHandle="OAI" ...>
  <setSpec>...</setSpec>
  <visibility>...</visibility>
</annotation>
<annotation format="general" creatorHandle="EV" ...>
  <creator>EVUser7</creator>
  <blog>6</blog>
</annotation>
<annotation format="nsdl_annotation_blog_comment" creatorHandle="EV" ...>
  <url>http://expertvoices.nsdl.org/blog</url>
  <blog>6</blog>
  <creator>EVUser19</creator>
  <content>This site sucks! Needs more kitten pictures!</content>
</annotation>

I think this format is much more elegant and straightforward with what the data is that the application is receiving, which should make for easier and more straightforward coding and handling.

Questions

  • Annotations are, at the basic level, metadata. What makes them different in such a way to have a separate object/concept? If the primary difference is that they use a defined format, that may be irrelevant, because metadata uses a format as well (e.g. nsdl_dc). The difference may lie in a less quantifiable distinction, that metadata consists more of properties of a resource, whereas an annotation is external data related to that resource. Perhaps this is the distinguishing characteristic? A good explanation must be conveyed to developers and users, lest we have annotations where we should have metadata.
  • Would it be advantageous to implement a "transaction" system that could allow for multiple calls (transactional/blocking or not) within one connection?
Personal tools