Community:Search/REST Basics
From NSDLWiki
Contents[hide] |
REST Search Service Basics
NSDL Search REST requests can be made via HTTP GET or POST. The usual issues regarding URL length and URI encoding apply.
The base URL is http://ndrsearch.nsdl.org/; To the base URL, you add the request verb as a path. So for search queries, you add the "search" path, like this: http://ndrsearch.nsdl.org/search
Request Verbs
Note: if an argument to a request contains special characters, they must be URL-encoded. For example, when searching for "hyperbolic paraboloid", the space must be encoded as %20. When retrieving a document whose URL contains a question mark ('?') or an ampersand ('&'), those characters must be encoded as %3F and %26, respectively.
- Examples
http://ndrsearch.nsdl.org/search?q=hyperbolic%20paraboloid http://ndrsearch.nsdl.org/getDocument?id=http://prisms.mmsa.org/review.php%3Frid=483
search
search the index for records that satisfy a query.
- Arguments
- q - query term. This can be words, phrases, or complex queries constructed according to the Lucene query syntax. See details here.
- n - number of documents to include in response. If not specified, defaults to 10.
- s - skip this many documents. Useful for repeated query requests. For example, your first request might specify &n=10&s=0 to get the first 10 documents that match the query. If there are more than 10 matches, your second request might use &n=10&s=10 to get the 11th through 20th matches. If not specified, defaults to 0.
- Example
http://ndrsearch.nsdl.org/search?q=hyperbolic%20paraboloid
getDocument
get search results for a specific record in the search index.
- Arguments
- id - identifier of the desired record. This is the URL of resource that the record represents. In the rare case where a record has no URL associated with it, the identifier is OAI identifier of its metadata in the NDR.
- Examples
http://ndrsearch.nsdl.org/getDocument?id=http://www.nsdl.org/ http://ndrsearch.nsdl.org/getDocument?id=http://prisms.mmsa.org/review.php%3Frid=483
resourceText
get the parsed text from the web page represented by a specific record in the search index.
- Arguments
- id - identifier of the desired record. This is the URL of resource that the record represents. In the rare case where a record has no URL associated with it, the identifier is OAI identifier of its metadata in the NDR.
- Examples
http://ndrsearch.nsdl.org/resourceText?id=http://nsdl.org/ http://ndrsearch.nsdl.org/resourceText?id=http://prisms.mmsa.org/review.php%3Frid=483
indexInfo
get base information about the search index, including the date it was last updated.
- Arguments
none
- Example
http://ndrsearch.nsdl.org/indexInfo
indexFieldInfo
get a list of fields in the search index, indicating which fields are searchable, and which fields are returned in the search results.
- Arguments
none
- Example
http://ndrsearch.nsdl.org/indexFieldsInfo
Response Format
All responses to NSDL Search REST requests will be well-formed, schema valid XML instance documents. Encoding of the XML will use the UTF-8 representation of Unicode. The namespace URI of the current NSDL REST Search Service is http://ns.nsdl.org/search/rest_v2.00/. The URL of the XML schema for validation of the current NSDL Search REST response is http://ns.nsdl.org/schemas/search/rest_v2.00.xsd).
Response Header
- NSDLSearchService - the root element
- responseTime - a UTCdatetime indicating the time and date that the response was sent. This must be expressed in UTC.
- request - indicating the request URL that generated this response.
- result element - this will be SearchResults or IndexInfo or GetDocument, etc., depending on the request verb, or it may be error if the request failed.
Error Codes
- badArgument - The request includes illegal arguments or is missing required arguments.
- badQuery - The Lucene search engine failed to parse the query string.
- systemProblem - The REST service code produced an unexpected error.
The SearchResults Response
The most frequent request is search, and it has the most complex response. Here is a brief description of the fields in the response.
- SearchResults - containing element for responses to a request with the Search verb.
- resultsInfo
- totalNumResults - the total number of documents in the lucene index that match the query.
- numSkipped - the number of results skipped for this response. This should be equal to the s parameter on the request.
- numReturned - the number of results returned in this response. This is limited by the n parameter on the request.
- results
- record - a repeatable element. Each record (also referred to as a result, a hit, a Lucene Document) is described by its contained elements. Note that when there is no value for an element, it may be missing from the results. For example, if there is no content available for a resource, there will be no contentLastModified field in the record header..
- position - this record's position in the entire set of results. The first record that matches the query is at position 0. If the request specified a non-zero s parameter, it will be reflected in the positions. So if the request was s=50, then the position of the first record in the response will be 50.
- score - the rank, or score, assigned to the Lucene Document which corresponds to this displayed record. The results are sorted with the first document in the result having the highest score.
- header - contains some information about the record.
- resourceIdentifier - the NSDL Metadata Repository's OAI identifier for the metadata used to create this record.
- lastIndexed - the last time this record was updated in the search service's lucene index.
- metadataLastModified - the value of the <datestamp> element the last time this record was harvested from the NSDL Repository.
- contentLastModified the last time the resource content was updated. This is the Last-Modified date in an HTTP header if its available; otherwise the date at which the content was fetched.
- fields - various fields perceived to be desirable for NSDL REST Search consumers. Basically, this includes the normalized Qualified Dublin Core harvested as nsdl_dc from the repository, and additional information about category, genre, primary identifier, brands, etc.
- (all normalized nsdl_dc fields present in the most recent harvested metadata) will be included first. The syntax followed is EXACTLY the same as that for nsdl_dc; the XML schema for the NSDL REST Search service even imports the nsdl_dc XML schema to ensure this. Please note: this means these fields are repeatable; they should be the same as the normalized nsdl_dc fields returned by an OAI-PMH GetRecord request for nsdl_dc from the Repository's OAI server.
- bestPassage - a passage from the fetched content that shows highlighted terms matching the search query
- resourceIdentifier - the unique URL for the resource.
- (compound fields: _Agent, _Audience, _Description, _Genre, _Subject, _Title) these fields contain the contents of all relevant normalized dc fields and their refinements. For example, compountGenre contains the contents of all dc:format fields and their refinements, and also the contents of all dc:type fields and their refinements, from the normalized nsdl_dc supplied from the MR.
- collContext - A repeatable field that describes a collection in which the resource can be found. A resource may reside in more than one collection.
- oai-id - the OAI Identifier for the collection.
- brand - the URL of the brand image associated with this collection, along with the dimensions of the image and the alternate text to use if the image cannot be displayed.
- metadataSource - A repeatable field that describes the OAI record from which the metadata was obtained. This includes the OAI ID and the URL which was listed as the resource identifier. When normalized, this should be the same as the resource identifier for the record.
- record - a repeatable element. Each record (also referred to as a result, a hit, a Lucene Document) is described by its contained elements. Note that when there is no value for an element, it may be missing from the results. For example, if there is no content available for a resource, there will be no contentLastModified field in the record header..
- resultsInfo