Community:Search/Filtering Queries

From NSDLWiki

Jump to: navigation, search

Contents

[hide]

Filtering Search Results (e.g. by Collection)

Fielded Searches

Filtering search results is a form of fielded searching. See Fields and Compound Fields.

Filtering Results (by Collection)

Which field?

A record's collection membership is expressed as a link to the collection record. This link is exposed in search results in the collContext field(s).

<collContext>
   <oai-id>oai:nsdl.org:crs:OAI_IDENTIFIER</oai-id>
</collContext> 

If you want to filter out a collection, your query string would need to filter based on the OAI identifier, as in this example: (Note the backslashes. These will be explained below.)

inCollection:oai\:nsdl.org\:crs\:316878


Finding the OAI identifiers for collection records

It is not a simple matter to determine the OAI identifier for a collection record, but until the process is improved, here is one way to do it.

Go to http://nsdl.org/ and choose the menu option to Browse Collections. Find the collection you are interested in. For this example, let's say that the collection is Analytical Sciences Digital Library (ASDL).

Beneath the collection title, you can see the URL that is associated with the collection record. In this case, http://www.asdlib.org/

Use the getDocument search request to obtain the information from that collection record in the search index:

http://ndrsearch.nsdl.org/getDocument?id=http://www.asdlib.org/

Look through the result for the <metadataSource> tag, and determine the OAI identifier.

...
      </collContext>
      <metadataSource url="http://www.asdlib.org/">oai:nsdl.org:crs:316878</metadataSource>
    </fields>
  </document>
...

Syntax for the filtered search

Given the OAI identifier(s) for filtered collection(s) and the searchable fieldname, all that remains is to add the appropriate terms to our search query.

AND inCollection:oai:nsdl.org:crs:316878

But whoops! The OAI identifier has colons, and colons are special characters to the Lucene QueryParser. So we need to escape them -- but NOT the colon separating the field name from the value:

AND inCollection:oai\:nsdl.org\:crs\:316878

Oh, but we want the reverse -- we want matches that are NOT in this collection. We can represent this with:

AND NOT inCollection:oai\:nsdl.org\:crs\:316878

So a full blown query, filtering out this collection's results, might look like this

http://ndrsearch.nsdl.org/search?&s=0&n=10&q=frogs%20AND%20NOT%20inCollection:oai\:nsdl.org\:crs\:00233 

If you want to filter out more than one collection, you add terms to the query. If your URL gets too long, do an HTTP POST request instead of an HTTP GET request.

Personal tools