Searching for a reserved word (OR)


(searchersteve) #1

I am trying to run a filter on a field that contains abbreviations for US states. My index contains documents where state=WA and where state=OR. The search works correctly on state=WA, but if I run the identical search for state=OR, I get no hits. No errors -- just no hits.

Is this because "or" is a reserved word and, if so, then how do I work around this issue?


(Clinton Gormley) #2

On Tue, 2011-01-11 at 19:30 -0800, searchersteve wrote:

I am trying to run a filter on a field that contains abbreviations for US
states. My index contains documents where state=WA and where state=OR. The
search works correctly on state=WA, but if I run the identical search for
state=OR, I get no hits. No errors -- just no hits.

Is this because "or" is a reserved word and, if so, then how do I work
around this issue?

Correct. OR is a boolean operator in the Lucene query syntax.
http://lucene.apache.org/java/3_0_0/queryparsersyntax.html

Given that state definitions are well defined entities, you probably
want to store them as not_analyzed
http://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/#String

Then, instead of including the state abbreviation in the query term
search, you probably want to filter your results with a term filter
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/term_filter
eg :

curl -XGET 'http://127.0.0.1:9200/_all/_search' -d '
{
"query" : {
"filtered" : {
"query" : {
"field" : {
"_all" : "keywords to find"
}
},
"filter" : {
"term" : {
"state" : "OR"
}
}
}
}
}
'

Note: terms are not analyzed at all, so if you index the state as 'OR'
then you must filter for it as 'OR', not 'Or' or 'or'

clint


(searchersteve) #3
On Tue, 2011-01-11 at 19:30 -0800, searchersteve wrote: > I am trying to run a filter on a field that contains abbreviations for US > states. My index contains documents where state=WA and where state=OR. The > search works correctly on state=WA, but if I run the identical search for > state=OR, I get no hits. No errors -- just no hits. > > Is this because "or" is a reserved word and, if so, then how do I work > around this issue?

Correct. OR is a boolean operator in the Lucene query syntax.
http://lucene.apache.org/java/3_0_0/queryparsersyntax.html

Given that state definitions are well defined entities, you probably
want to store them as not_analyzed
http://www.elasticsearch.com/docs/elasticsearch/mapping/core_types/#String

Then, instead of including the state abbreviation in the query term
search, you probably want to filter your results with a term filter
http://www.elasticsearch.com/docs/elasticsearch/rest_api/query_dsl/term_filter
eg :

curl -XGET 'http://127.0.0.1:9200/_all/_search' -d '
{
"query" : {
"filtered" : {
"query" : {
"field" : {
"_all" : "keywords to find"
}
},
"filter" : {
"term" : {
"state" : "OR"
}
}
}
}
}
'

Note: terms are not analyzed at all, so if you index the state as 'OR'
then you must filter for it as 'OR', not 'Or' or 'or'

clint

Thanks!!! I was, in fact, searching for state with a filter, but I did not know about the not_analyzed solution. I am realizing that I have a number of fields that are like this (discrete and predictable values), and I suspect that I could improve indexing speed quite a bit if I selected not_analyzed.

I'll try out your suggestion and let you know if problems arise. Thanks again for the schooling.

Steve


(system) #4