Community:Search/Lucene Query Syntax

From NSDLWiki

Jump to: navigation, search

Contents

[hide]

Query Syntax for the Lucene Search Engine

Boolean support

“AND”, “OR”, “NOT”, grouping with parentheses are all supported

(flying OR spaghetti) AND monster 

Note that the boolean operator words must be upper case. Note also that the default boolean operator is "AND". So the following are equivalent:

spaghetti monster
spaghetti AND monster

The following clauses are also equivalent to the clauses above, but not for the reason you might expect:

spaghetti or monster
spaghetti not monster
spaghetti and monster

Lowercase "and", "or", "not" are ignored stopwords so the default boolean operator of AND is used. However, these queries, using uppercase boolean operators, are different:

spaghetti OR monster
spaghetti NOT monster

Phrase searching

Use quotes to search for multiple words in exact order (e.g. “triumphal arch”). Note that stopwords occurring within the phrase are ignored - these phrases all give the same results:

in the air
on the air
air

Fielded searching

You can look for search terms in a particular field, e.g. audience or subject or title. Find available fields with REST request indexFieldInfo. See Fielded Searching.

Filtered searching

Use “+” to require a query term; “-” to prohibit one. This query requires "nutch" in all result documents, and prohibits documents that contain "heritrix":

+nutch -heritrix 

Boosting

Assign importance values to individual query terms when searching by adding a caret “^” followed by a boost factor after the term. This query will consider "conductivity" more relevant than "electrical":

electrical conductivity^3

Wildcard Searches

Use * for 0 or more chars; ? for single char. So, na* matches "naomi", "name", "namibia", "nat"; na? matches nat, nab, nap. Lucene does not allow * as the first character of a term.

Stemmed words

The index has both stemmed and unstemmed versions of most fields. (e.g. sit matches sits and sitting for stemmed fields). REST request verb indexFieldInfo shows which fields have stemmed versions.

Fuzzy Searches

Find words that are similar in spelling to the words in the query (e.g. roam~ matches "roam", "roams", "foam").

Proximity Searches

Specify allowed distance between words in a document (e.g. “nutch lucene”~10 will search for "nutch" and "lucene" within 10 words of each other).

Range Searches

Matches values within specified range. Use square brackets [ ] to include the bounds, curly brackets { } to exclude them.

Further documentation on these query features is available from Lucene at http://lucene.apache.org/java/docs/queryparsersyntax.html.




Escaping Special Characters

Certain characters in a search query will be interpreted as indicating the above features. These characters are:

+ - && || ! ( ) { } [ ] ^ " ~ * ? : 

If you do NOT wish the above characters to be interpreted as part of the query syntax, you must escape them by using the \ before the character. Examples:

to find "hodge-podge", use

hodge\-podge

to find "(1+1):2", use

\(1\+1\)\:2
Personal tools