Community:Search/Fields And Compound Fields

From NSDLWiki

Jump to: navigation, search

Contents

[hide]

Fielded Searching

What are the searchable fields for the REST search service?

There's a REST verb, indexFieldInfo, to provide this information: http://ndrsearch.nsdl.org/indexFieldInfo You'll see that there are many searchable fields, a few non-searchable fields, and a bunch of resultFields.

What is the syntax for fielded searches?

http://search.nsdl.org?verb=Search&s=0&n=10&q=(fieldname):(fieldvalue)

where fieldname is the name of a field in the REST search service index, and fieldvalue is what you want the field to contain. If you're interested in fancier queries, you can exploit the knowledge that the REST service uses the Lucene QueryParser to process queries. See this page for more information on Lucene queries.

Note: The colon is used to separate the field name from the field value. So, when searching in a field whose name contains a colon (e.g. "dct:title"), the embedded colon must be escaped with a backslash. For example

dct\:title:frogs

REST Compound Fields

What are Compound Fields? Why do we need them?

Compound fields exist to facilitate commonly desired searches. Because the NSDL normalizes metadata to Qualified Dublin Core (QDC) (http://dublincore.org/documents/dcmi-terms/, http://metamanagement.comm.nsdlib.org/outline.html, http://ns.nsdl.org/schemas/nsdl_dc/nsdl_dc1.xml), and because most of the commonly desired fielded searches do not map cleanly to a single QDC element, the REST search service provides "compound" fields. This allows an easy, single field search to cover all the fields that should be in scope for, say, a "subject" search. Otherwise, you would have to have a complicated boolean search to ensure you were looking for all varieties of "subject" fields: those with and without "encoding schemes" (i.e. controlled vocabularies, such as MIME types), and those with and without Qualified DC refinements (e.g. "abstract" is a refinement for "description").

There are currently six searchable compound fields:

  • compoundAgent: think "author" or "person/institution" that created the resource. Maps to Dublin Core fields "creator" or "contributor" or "publisher". no refinements to these fields, and no encoding schemes, either.
  • compoundAudience: yep, there are refinements to the "audience" field, such as "educationLevel"
  • compoundDescription: think "description" or its refinements ("abstract" "table of contents").
  • compoundGenre: think resource type or format -- Dublin Core fields "type" or "format" or their refinements "extent" "medium" with and without indicated special vocabularies (called encoding schemes in DC parlance)
  • compoundSubject: subject, coverage, range: with refinements and encoding schemes, such as LCSH, GEM, DDC (dewey) etc.
  • compoundTitle: there are some refinements to the DC "title" field, such as "alternate title".

These compound fields are provided to facilitate the most commonly desired fielded searches -- use the compound fields to get more complete search results (increase your "recall", in information retrieval parlance).

Take advantage of 'em! :-)

Personal tools