Community:Collections and Metadata/SelectionCriteriaandQualityIssues

From NSDLWiki

Jump to: navigation, search

This is a discussion document outlining the selection criteria for resources added to the NSDL Collection and quality issues that may arise concerning these resources. The document is concerned with whether the content of the resource is appropriate for inclusion in the NSDL Collection. The usability of the resource (i.e., whether it works on a particular platform or not, etc.) is out of scope for this document. The NSDL Collection Development Policy (Draft 7/15/03) outlines the process by and criteria for which collections and resources are selected for the NSDL Collection.

Areas of concern about content in ascending order

  • Irrelevant and/or out of scope resources
  • Poorly designed or poorly presented resources
  • Resources that are out of date
  • Resources designed to express a point of view that is primarily religious or political, rather than scientific
  • Pernicious content (pornography, blatant commercial intent, etc.)

The NSDL Collection Development Policy states:

Selection of resources for the NSDL Collection typically occurs by selecting collections that are known to be within the NSDL Scope (Section 3), are likely to be of value to the NSDL clientele (Section 2), and that are under the care of an NSDL partner or other reputable organization. A fundamental criterion is the trade-off between the value of the collection to the mission of the NSDL and the effort required to bring it into the NSDL Collection. Additional factors and criteria are detailed below.

In the context of the NSDL Collection, the scientific and/or educational quality of a resource or collection is the suitability for use in some part of science education. Scientific and/or educational quality has many aspects – one resource may be scientifically accurate but not useful for education; another may be suitable for middle school students, but too narrow for undergraduates. Scientific and/or educational quality is also subjective; it depends on the judgment of individuals. Ideally, every aspect of the scientific and/or educational quality of resource would be carefully assessed from a variety of viewpoints and that information would be available with the resource. (For example, information that the Eisenhower National Clearinghouse has reviewed an article and judged it suitable for high school education.) In practice, such information is known for only a small fraction of the resources in the NSDL Collection. Commitment to scientific or educational quality—as determined by the CICDT (Core Integration Collection Development Team), in consultation with the Content Standing Committee—is a primary criterion for selecting NSDL partners and, by implication, the associated collections. Nonetheless, the “quality” of individual resources (especially as perceived by various users of the NSDL) will be variable, due largely to differences among partners in their approaches to collection building. Methods will range from individually peer-reviewed resources to collections developed via automated methods, such as natural language processing.

The NSDL is developing a framework for recording information about the scientific and/or educational quality of resources. This framework is likely to include:

  • metadata that can be associated with resources or collections,
  • annotations that allow comments and reviews on resources,
  • information on how the resource came into the NSDL Collection, including descriptions of the algorithms used in automated methods.

When available, this information will be stored in the metadata repository and available to NSDL services. For instance, a browse service might use this information to restrict its scope to resources that have been recommended for high school students.

The NSDL program encourages broad variety in the resources that individual collections manage, but resources are most valuable to the NSDL and integrated most easily if they satisfy the following criteria:

  • 1. The content of resource is appropriate to fulfilling the mission of the NSDL
  • 2. The content of the resource matches the subject scope of NSDL Collection. Since almost all scientific materials have the potential to be used in some aspect of education, this scope is very broad.
  • 3. The resource fills a gap identified via user feedback or evaluation studies.
  • 4. Resources that have been developed with funding from the United States Government and especially from the NSDL program have special importance. Other government funding is also a criterion, e.g., other NSF educational projects, the Digital Libraries Initiative, the Institute of Museum and Library Services,, etc.
  • 5. The resources are available with open-access or there is reasonably full metadata available for ingest to the metadata repository and use in NSDL services. (See Section 6.1.
  • 6. Quality information is provided, especially for resources intended for K-12 education. It is important that individual collections have clearly stated their criteria for selection in their own collection development policy.
  • 7. The collection or resources satisfies one of the technical methods listed in Section 10.
  • 8. Resources are most useful if they are well managed by a reliable source. This includes: good metadata, reliable delivery, expectations of permanence, etc.

The NSDL Collection will strive not to include resources that are:

  • “bad science,” e,g., inaccurate information or pseudoscience, except when appropriately tagged or embedded in educational materials that raise awareness on these matters.
  • socially undesirable, e.g., pornography.

In a collection of this size, no guarantees can be made that all such items are eliminated.


Though resources that are clearly out-of-scope or that become inaccessible may be removed from the NSDL, de-accessioning is not a primary means for maintaining the quality of the NSDL Collection. Instead, various means are to be provided for marking content as to quality and for presenting views of the NSDL that adhere to (user-specific) quality thresholds. If a more detailed policy is developed on the removal of materials from the NSDL Collection, it will appear here."

Re-Framing the Quality Discussion in NSDL: a discussion

The discussions regarding the development of quality collections of resources within the NSDL occur on many levels. Oftentimes assumptions about what constitutes quality loom up quickly, regardless of whether the participant in the discussion is familiar with the stated NSDL goals of large scale and automated operation. Some of these assumptions are based on the notion of traditional library selection processes, but others emanate from an even more restrictive notion of peer review, where someone or a group determines according to objectively established criteria what is “in” and what is “out.” This model might generally be characterized as the Gatekeeper Model.

The necessity of a gatekeeper of some kind seems to many people an extension of a “national” resource funded by a government agency, and to some extent, this is indisputable. Clearly, there is a bottom line that must be maintained. We all agree that spam, pornography and materials completely irrelevant to the STEM focus do not belong in NSDL, and that such materials should be excluded. It is beyond this point that divergence of opinion begins.

Leaving aside for a moment whether a gatekeeper (whether a person or group) is feasible given the funding levels of the NSDL, there’s a more elemental question of whether a panel of experts is an appropriate way to develop what is intended to be a collaborative, community-based resource available through a variety of “portals.” The Gatekeeper Model, essentially a top-down method of managing quality, is expensive, slow, and difficult to manage. None of these problems are, by themselves, sufficient to look elsewhere for a model--if it were the correct one, efforts should be made to address those issues specifically.

But given that the NSDL is beginning to focus on scenarios where the CI functions as a “wholesaler” of sorts, and portals as “retailers” of more specialized, focused information to end users, it seems less obvious that any gatekeeping to be done should occur at the “wholesale” level. Portals will themselves be defining gates around their focus, tailoring their offerings to a subset, whether topical, audience type or level, or resource type--or possibly a combination of factors.

Shifting the discussion a bit from the loaded term “quality,” we might want to introduce the idea of “usefulness.” Usefulness is clearly a subjective term, defined at a much more accessible level, by users, whether portals or people. A math portal, for instance, might define usefulness by the subject of a resource, by its level of review or the sources of annotations, or by any other criteria they could reasonably define. A teacher might define as specifically useful any image resource about animals submitted by another teacher.

The key to allowing various definitions of usefulness to reside happily together is the ability to associate characterizations and assertions by other services, agents and users and to build these upon a base of information that may be inadequate or of questionable quality by itself, and to expose those characterizations to portals and end users as appropriate. This approach depends on tools that allow assertions to be made at various levels, whether via adding value to data from external services (indexing services, taxonomy services, etc.) or by individual users who wish to share their experience with a resource. It also depends on users having access to the information submitted, in ways that meet their needs of the moment.

In this alternate model, which could be called a Collaboration Model, useful resources, no matter their original source, may rise into view of a person or be selected by a portal as “useful” as the result of a variety of activities going on within NSDL. For instance, formal or informal annotations or comments added by other users regarding their experience or impression of the resource (perhaps including ratings) might be a criteria selected to define usefulness. For another user or portal, only formal annotations or comments by a particular category of commentator might be useful, regardless of the source of the resource. Since usage of resources is intended to be tracked and made available as a criteria, a user might define a criteria, or a resource might be highlighted in a portal, based on its overall usage, recent usage, or how many people had contributed reviews or annotations within a recent period.

Given the model outlined above, we would expect that the resources that are irrelevant, poorly designed or out of date (our first three categories above), would quickly fall to the bottom of the heap of resources available to the NSDL user, whether viewing via or another portal. In fact, as the tools we hope to develop bring the better resources to the top, it's likely that a user would never be motivated to look at the bottom ranked resources, thus effectively obviating the need for direct human intervention to remove them. Given that some of the irrelevant resources in particular come into the NSDL via routine harvesting (they may not be sufficiently differentiated from the desired resources to be eliminated from the harvest process), this seems the most efficient method for the large majority of those resources that might be deemed generally not useful but not a representing a "bibliographic emergency."

For the resources in the bottom two levels described above, e.g. those with a political or religious slant or blatantly commercial or pornographic, there might be a variety of mechanisms established to identify and either mark or eliminate those which fit into the more problematic categories. Routine searches for particular keywords might identify those resources that should be immediately removed (for which a draft process already exists), and others might be identified by users to the site. We might determine that our Archive Service was in the best position to identify these kinds of resources, and delegate some responsibility to them. There is clearly much we don't know yet about what the best strategy might be, given our overall level of support and the not yet exponential growth of resources in the Metadata Repository. It's likely that learning will be iterative, with a bit of stumbling--but we will learn, and should, without requiring that we hire Chicken Little as a consultant.

It is axiomatic in digital libraries that only some of the successful strategies of the pre-digital world will survive and flourish in this newer world. This understanding leads us to examine closely our assumptions lest we drag too many unwieldy legacies with us as we move forward.

For a good discussion on necessity of teaching stdents and learners be good information manageres see:
Evaluating Internet Sites -

and the skills of "critical literacy" see:

Critical literacy: the WWW's great potential

Personal tools