Community:NCore/Membership and Projection

From NSDLWiki

Jump to: navigation, search

Contents

[hide]

Data Boundaries in an NCore Repository

The ability to define and use sets of data is an essential function of NCore. In the most general sense, sets of data may be defined by "selecting" them based upon some shared characteristic. In that sense, the results of a keyword-based search for resources would represent one set of interest. All resources in that set would share some matching metadata field, or some matching fragment of textual content. The utility of such set might be to feed a search result browsing interface, where matching objects may be examined and visited.

While the boundaries of a set of resources by keyword search works well information discovery, there are many purposes for which this somewhat ad-hoc means of defining a set is inappropriate. For example, suppose that some person or application wishes to:

  • Control the membership of a set
  • Assert facts about a specific set
  • Share sets with other applications or users
  • Determine the provenance of a given set - who created it?.


In order to address these and other issues, NCore defines a formal framework for defining the boundaries around sets of data with the repository. This page will describe this framework, its capabilities, limitations, and implications.

Explicit Groupings

Aggregators

Aggregators are objects that represent an explicit set of objects in the repository. Membership in an aggregation is strictly controlled in that only the single owner of the aggregation (or any other agents that have been explicitly authorized) may add or remove members of any given Aggregator. An aggregator may contain any number of objects, and any given object may be in any number of Aggregators.


Since an Aggregator is a first-class object that has an identity, it serves as the set's representation in relationships to or from the set, and may be referenced by handle. For example, suppose that one wanted to perform a search within a set of resources (only). If these resources were collected into a single aggregation, then specifying the handle of the Aggregator would be all that is necessary to restrict results to within that specific set.


MetadataProviders

MetadataProviders differ from Aggregators in two significant respects:

  • They may contain only Metadata objects
  • A given Metadata object must be in exactly one MetadataProvider.

Thus, in terms of grouping objects, MetadataProviders can be seen as a special case of an Aggregator- one that has additional restrictions on its contents. A MetadataProvider, however, imparts provenance on its members, in that the origin each Metadata member can be traced through its single MetadataProvider to that provider's single Agent owner. Membership in an Aggregator does not imply any sort of provenance of the individual members.

Hierarchical Groupings, Implicit Membership, and Projections

Despite its utility, set definition by direct-membership in Aggregators or MetadataProviders has a major limitation, namely that implicit membership (Membership in set A implies membership in set B) is not supported by the model. This construct is useful in library environments. For example, membership in a Collection may imply membership in set of all the library resources.

Fig. 1 Hypothetical set of Aggregators that define a number of different boundaries around data in the graph. Large, empty circles are Aggregators, Small, filled circles are Resources

There are several glaring disadvantages to supporting an exclusively direct-membership model. All implied memberships would have to be manually specified and maintained whenever adding or moving an object, for example. A user or application would bear the responsibility of knowing and maintaining all applicable business rules of implied set membership. Additionally, permissions on aggregators throughout the repository would necessarily have to be loose and coarse-grained in order to allow users of the repository to assert all the necessary direct set-membership relationships they are forced to maintain.

Fig. 2a All NSDL library contents
Fig. 2b All blog contents (blogosphere)
Fig. 2c A single blog
Fig. 2d A user's blog posts

Implied set membership in NCore is expressed through a hierarchy of aggregators, and is transitive. In other words, if an Aggregator B is a member of Aggregator A, then B's members are implicitly considered to be members of A as well. There is no physical relationship in the NCore graph that represents this implied membership— implied membership is entirely conceptual. Therefore, we will adopt the language that an object is a member of a grouping if and only if it is directly related to an Aggregator object through a memberOf relationship. We will use the terms in or under when describing objects that have implicit membership in a grouping. Likewise, we will use the term projection to represent the set of all objects that are under a given Aggregator.

Fig. 1 depicts a number of Aggregators and their members in a repository. In this example, Aggregators have been used to organize the objects in the repository. For example, "NSDL Collection" is an Aggregator that defines the boundaries of that which is considered to be library content, i.e. collections and items. "Whiteboard Report" is one such NSDL collection, which contains one issue "Issue 42" which contains two resouces. Likewise, "NSDL Blogs" is a collection in the NSDL, which itself is composed of a number of blogs (Blog A, Blog B), and is part of a bigger "Blogosphere".

Fig. 2 represents some possible projections created by implied aggregator membership. Boundaries that demarcate those repository objects in a given aggregation are represented as dashed lines. Fig. 2a, for example, represents all objects underneath the "NSDL Collection" aggregator. Clearly, this is taken to comprise its direct members, Whiteboard and NSDL Blogs, their members, their member's members, etc. Likewise, Fig. 2b represents the projection of the Blogosphere, Fig. 2c a single blog, and Fig. 2d every contribution made by a particular blog user.

It is important to note that since projections are merely a logical construct of transitive membership, every aggregator in the repository can be seen as having a projection— it is merely a matter of perspective.

Implications

Maintenance: Structure and Content

Because members of a subset are implied to be in the parent set, and there is no physical relationship instantiating this implied membership, there are essentially no additional maintenance tasks required when adding or removing members from a subset. Indeed, adding or removing items from an aggregation requires no a priori knowledge of the surrounding context, nor any knowledge of any unrelated projections may be seen to contain it.

Maintaining an area in a repository as a specific projection with a logical structure underneath an Aggregator, such as maintaining a collection that is composed of sub-collection, involves two types of maintenance: maintenance related to structure, and maintenance related to content. Maintenance of structure involves assembling aggregators into a logical tree or hierarchy, based upon the needs of the user or application (e.g. defining sub-collections in relation to their parent collection). Maintenance of content involves adding or removing individual items into the repository, and placing them into any directly applicable aggregations (which may only be one single aggregation, such as in adding a member to a sub-collection). This separation of maintenance into structure and content affords a great deal of flexibility and a separation of concerns in building the complex structures that may be seen in a library or network of collaborative applications.

Delegation of Authority

Fig. 3 Example of delegated authority in the NSDL, using Aggregators from fig. 2

Access to an Aggregator's membership (adding or removing objects) is controlled by explicit permissions. Membership of that aggregator in other aggregations is independent of those permissions. In other words, an agent might control the membership of aggregator A, but has no say in whether other aggregators may contain A. Because of this effect, it is possible to create an Aggregator in order to define a projection, and include as its members other Aggregators that might belong to other parties. In that situation, one agent has ultimate structural control over the members of a projection, but is delegating control and management of content or finer-level structure to others.

Provenance

For every Aggregator, there is exactly one Agent that "owns" it, and is responsible for grouping or selecting its members. Since there is a projection under every Aggregator, the owning agent is likewise responsible for the underlying projection, even if that agent does not directly control the membership in aggregators within the projection, as in the case of delegated authority.

Usage

As mentioned in previous sections, indirect membership in an aggregation is not single a physical relationship, it is a path in the graph of progressive membership. A projection is defined conceptually in the model as a series of these paths, each with a common endpoint— the topmost Aggregator, which serves to identify the projection.

Ultimately, it is up to backend and frontend services to be aware of and use projections for their own purposes. To facilitate this, we envision a projection index that will be able to answer questions such as "what are all the items under aggregator X?" or "Given item Y, what is the set of projections of which it is a member?" These questions are especially pertinent to defining and implementing a library

Most public exposure/usage of projections directly is expected to occur through the NCore Search service, where users or applications would have the ability to specify a projection to search within. For the NSDL, for example, by default, searches would take place relative to the NSDL Collection. Specification for this is forthcoming.

Status

Basic design is done. Data model can already accommodate that which is necessary to express projections. Implementation is in the research/prototype stage, with a production-ready version expected in early 2008.

Generalized Projection

What would be the implications of considering projections under objects that are not groupings, such as Agents or Resources? For example, we could designate a series of 'parent-child' relationships between objects that generally define membership in a projection. For aggregators, we already use the relationship memberOf for this purpose, by definition. Likewise for MetadaProviders, metadataProvidedBy. Imagine using associatedWith for Resources, and aggregatorFor and metadataProviderFor for Agents. A projection under an Agent, for example, would demarcate an Agent's entire "sphere of influence" - everything it grouped into aggregations, every Metadata item it provided. Is it worth pursuing this line of thought?

Personal tools