Community:Search/Lucene Query Syntax
From NSDLWiki
Contents[hide] |
Query Syntax for the Lucene Search Engine
Boolean support
“AND”, “OR”, “NOT”, grouping with parentheses are all supported
(flying OR spaghetti) AND monster
Note that the boolean operator words must be upper case. Note also that the default boolean operator is "AND". So the following are equivalent:
spaghetti monster spaghetti AND monster
The following clauses are also equivalent to the clauses above, but not for the reason you might expect:
spaghetti or monster spaghetti not monster spaghetti and monster
Lowercase "and", "or", "not" are ignored stopwords so the default boolean operator of AND is used. However, these queries, using uppercase boolean operators, are different:
spaghetti OR monster spaghetti NOT monster
Phrase searching
Use quotes to search for multiple words in exact order (e.g. “triumphal arch”). Note that stopwords occurring within the phrase are ignored - these phrases all give the same results:
in the air on the air air
Fielded searching
You can look for search terms in a particular field, e.g. audience or subject or title. Find available fields with REST request indexFieldInfo. See Fielded Searching.
Filtered searching
Use “+” to require a query term; “-” to prohibit one. This query requires "nutch" in all result documents, and prohibits documents that contain "heritrix":
+nutch -heritrix
Boosting
Assign importance values to individual query terms when searching by adding a caret “^” followed by a boost factor after the term. This query will consider "conductivity" more relevant than "electrical":
electrical conductivity^3
Wildcard Searches
Use * for 0 or more chars; ? for single char. So, na* matches "naomi", "name", "namibia", "nat"; na? matches nat, nab, nap. Lucene does not allow * as the first character of a term.
Stemmed words
The index has both stemmed and unstemmed versions of most fields. (e.g. sit matches sits and sitting for stemmed fields). REST request verb indexFieldInfo shows which fields have stemmed versions.
Fuzzy Searches
Find words that are similar in spelling to the words in the query (e.g. roam~ matches "roam", "roams", "foam").
Proximity Searches
Specify allowed distance between words in a document (e.g. “nutch lucene”~10 will search for "nutch" and "lucene" within 10 words of each other).
Range Searches
Matches values within specified range. Use square brackets [ ] to include the bounds, curly brackets { } to exclude them.
Further documentation on these query features is available from Lucene at http://lucene.apache.org/java/docs/queryparsersyntax.html.
Escaping Special Characters
Certain characters in a search query will be interpreted as indicating the above features. These characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? :
If you do NOT wish the above characters to be interpreted as part of the query syntax, you must escape them by using the \ before the character. Examples:
to find "hodge-podge", use
hodge\-podge
to find "(1+1):2", use
\(1\+1\)\:2