Community:Search API FAQ
From NSDLWiki
Search API Frequently Asked Questions (FAQ)
Q 1: Where is the documentation for the Search API?
See Search Service API documentation.
Q 2: What is the baseURL for the Search API?
The baseURL is http://nsdl.org/dds-search
Q 3: Can I filter the search results by Education Level, Audience, Resource Type, Access Rights, etc.?
Yes. Education Level, Resource Type, Access Rights, Language, Medium, and MimeType are fixed controlled vocabularies that are normalized in the NSDL DC metadata and search index. Subject also has a controlled vocabulary but the metadata is not filtered so there are other subject vocabularies and keyword that appear in that field.
Use the ListTerms API request to fetch the list of terms for each field (to get a list of fields, use ListFields).
For example, these requests list the terms/vocabs for Education Level and Type, respectively: http://nsdl.org/dds-search?verb=ListTerms&field=/key//nsdl_dc/educationLevel http://nsdl.org/dds-search?verb=ListTerms&field=/key//nsdl_dc/type
Fields that start with /key//nsdl_dc/
correspond to metadata the elements in the NSDL DC metadata framework, which represents the educational resources in the library. See this NSDL DC metadata guide for details.
To filter the search by a given field/term, add a Boolean clause to the query supplied in the 'q' argument, for example (user keywords) AND (/key//nsdl_dc/educationLevel:("Elementary School")) AND xmlFormat:nsdl_dc
Q 4: When I do a search I sometimes get collection and annotation records back but all I want is NSDL DC records
The repository contains multiple formats/types of records. Format nsdl_dc describes educational resources, comm_anno contains annotations (comments, tips, reviews, etc.), and dlese_collect describes the collections in the repository.
To limit a search to educational resources in the NSDL DC format, add a Boolean query clause to your search query to require xmlFormat:nsdl_dc. For example: (user keywords) AND (xmlFormat:nsdl_dc)
.
Q 5: How does NSDL.org use the API to implement it's search pages?
The search and browse pages at NSDL.org are implemented using the Search API. The UI provides keyword search as well as options to search and filter by Grade Level, Resource Type, Subject, and Pathway. For example, see this search for 'ocean' with 'Grade Level: Elementary School' selected.
To see the full API request that is used for a given search in the NSDL.org UI, first perform a search like you normally would and then add the http param &showquery=1
to the url in your browser, for example:
http://nsdl.org/search/index.php?q=ocean&submitButton=Go&n=10&educationLevel%5B%5D=Elementary+School&showquery=1
The q
argument in the API request contains the search query, for example from the above:
(((ocean) OR stems:(ocean) OR title:(ocean) OR titlestems:(ocean) OR description:(ocean) OR descriptionstems:(ocean))) AND (/key//nsdl_dc/educationLevel:("Elementary School")) AND xmlFormat:nsdl_dc
Here the term 'ocean' is expanded to search across the default, stems, title, titlestems, description and descriptionstems fields, which is done to boost the rank ordering for matches in the title, titlestems, description and descriptionstems fields of the record while also including matches in the default and stems fields.
The next clause in the query, AND (/key//nsdl_dc/educationLevel:("Elementary School")
, filters the search to education/grade level "Elementary School". The final clause, AND xmlFormat:nsdl_dc
, filters the search to NSDL DC records only, which are the educational resources in the library.
Refer to the API documentation for information about what the other request parameters are used for. NSDL.org uses some advanced features from DDS, which may not be needed for many applications. Play around with this UI and compare the request that is being sent to see what is going on.
Q 6: How can I apply my own query expansion and boosting in the Search request?
The Search request allows you to construct Lucene queries that operate over any field in the search index, supplied in the q
argument of the request. A text query without any explicit field specifier is applied to the default field, which contains the full text of the resource metadata and the crawled content of the front page of the resource itself. Applying a user's search terms to the default field produces fairly good results but it is often better to apply query expansion techniques such as stemming to increase the scope of the search and to perform boosting to augment the ordering of the results to bring more relevant resources to the top of the hit list.
The index contains a number of search fields that are useful for query expansion and boosting. These include but are not limited to:
-
stems
- Contains the same content as the default field but in stemmed form -
title
- Contains the title of the resource -
titlestems
- Contains the title of the resource in stemmed form -
description
- Contains the description of the resource -
descriptionstems
- Contains the description of the resource in stemmed form
To apply query expansion, construct a Boolean query using the Lucene syntax to expand the user's query across the desired fields, and then supply this in the q
argument in the Search request. For example, if the user has typed a search for the word ocean, the following progression illustrates how the expanded query might be constructed:
-
(ocean)
- The raw user search terms without any query expansion applied. This will match results that have the exact word ocean in the default field.
-
(ocean) OR stems:(ocean)
- Expands the query to also match word stems. This will match results that have the exact word ocean in them as well as morphologically similar words including oceans and oceanic. -
(ocean) OR stems:(ocean) OR title:(ocean)
- Adds boosting to provide more weight to those results that contain the exact word ocean in the title. This will match the same results as the previous example but provide them in a different rank order.
-
(ocean) OR stems:(ocean) OR title:(ocean) OR titlestems:(ocean)
- Adds additional boosting to provide weight to titles that contain morphologically similar words to ocean in the title and still more weight for exact matches for ocean in the title. This will match the same results as the previous example but provide them in a different rank order.
-
(ocean) OR stems:(ocean) OR title:(ocean) OR titlestems:(ocean) OR description:(ocean) OR descriptionstems:(ocean)
- Adds weight for matches in the description and stemmed version of description as well, which will match the same results as the previous example but provide them in a different rank order.
Note that for the stemmed fields it is not necessary to stem the words supplied in the query - just supply the user's text exactly as it was typed and the Search API will automatically apply word stemming in the appropriate fields for you. Because the colon is a reserved character in the Lucene syntax, these and other reserved characters should be filtered out prior to issuing the query, unless you want to allow the user to perform Lucene queries themselves.
One more Boolean clause that is often desirable is to restrict the search to educational resources only by adding a clause xmlFormat:nsdl_dc
to the query:
-
((ocean) OR stems:(ocean) OR title:(ocean) OR titlestems:(ocean) OR description:(ocean) OR descriptionstems:(ocean)) AND xmlFormat:nsdl_dc
- Restricts the search to educational resources in the NSDL DC format only.
This final example is the query expansion technique used for user's searches at nsdl.org.