In our search we have configured text with 2 analyzers, english and
standard so we can match phrases on the standard-analyzer. We break the
keywords by space, and create a bool query for each word.
This is working fine for all cases except where the query has standard
word-separators like & (ampersand), ; (semi-colon), etc. As
word-separators are stripped in index by analyzer, searching for them
returns 0 results. Gist.
I don't want to use a whitespace analyzer because we do actually want to
ignore word separators. I was thinking about hacky workarounds like
removing all standalone non-alphanumeric characters, or moving them in
"should" instead of default "must" (in case we do have analyzers in future
that are whitespace).
On other hand, If I use a single query_string instead of bool of terms it
works. Does ES/lucene determines not to use the word-separators by looking
at the definition of the fields.
On Friday, September 19, 2014 11:05:59 AM UTC-4, Ankush Jhalani wrote:
In our search we have configured text with 2 analyzers, english and
standard so we can match phrases on the standard-analyzer. We break the
keywords by space, and create a bool query for each word.
This is working fine for all cases except where the query has standard
word-separators like & (ampersand), ; (semi-colon), etc. As
word-separators are stripped in index by analyzer, searching for them
returns 0 results. Gist. elasticsearch - bool search - word separator issue · GitHub
I don't want to use a whitespace analyzer because we do actually want to
ignore word separators. I was thinking about hacky workarounds like
removing all standalone non-alphanumeric characters, or moving them in
"should" instead of default "must" (in case we do have analyzers in future
that are whitespace).
just checking back if anyone has any ideas.. thanks!
On Friday, September 19, 2014 11:05:59 AM UTC-4, Ankush Jhalani wrote:
In our search we have configured text with 2 analyzers, english and
standard so we can match phrases on the standard-analyzer. We break the
keywords by space, and create a bool query for each word.
This is working fine for all cases except where the query has standard
word-separators like & (ampersand), ; (semi-colon), etc. As
word-separators are stripped in index by analyzer, searching for them
returns 0 results. Gist. elasticsearch - bool search - word separator issue · GitHub
I don't want to use a whitespace analyzer because we do actually want to
ignore word separators. I was thinking about hacky workarounds like
removing all standalone non-alphanumeric characters, or moving them in
"should" instead of default "must" (in case we do have analyzers in future
that are whitespace).
just checking back if anyone has any ideas.. thanks!
On Friday, September 19, 2014 11:05:59 AM UTC-4, Ankush Jhalani wrote:
In our search we have configured text with 2 analyzers, english and
standard so we can match phrases on the standard-analyzer. We break the
keywords by space, and create a bool query for each word.
This is working fine for all cases except where the query has standard
word-separators like & (ampersand), ; (semi-colon), etc. As
word-separators are stripped in index by analyzer, searching for them
returns 0 results. Gist. https://gist.github.com/
ajhalani/3def3ea7caec5cd58490
I don't want to use a whitespace analyzer because we do actually want to
ignore word separators. I was thinking about hacky workarounds like
removing all standalone non-alphanumeric characters, or moving them in
"should" instead of default "must" (in case we do have analyzers in future
that are whitespace).
A few weeks ago I released an Elasticsearch plugin that allows you to
override the default word boundary properties for Unicode characters as
implemented by the StandardTokenizer algorithm. I had the same issue where
I wanted to use the StandardTokenizer but override the word boundary
properties for special characters like '#', '@', etc. (for example, treat
them the same way as the '_' , which is categorized as an extended
num-letter)
On Monday, September 22, 2014 12:19:10 PM UTC-4, Ankush Jhalani wrote:
just checking back if anyone has any ideas.. thanks!
On Friday, September 19, 2014 11:05:59 AM UTC-4, Ankush Jhalani wrote:
In our search we have configured text with 2 analyzers, english and
standard so we can match phrases on the standard-analyzer. We break the
keywords by space, and create a bool query for each word.
This is working fine for all cases except where the query has standard
word-separators like & (ampersand), ; (semi-colon), etc. As
word-separators are stripped in index by analyzer, searching for them
returns 0 results. Gist. elasticsearch - bool search - word separator issue · GitHub
I don't want to use a whitespace analyzer because we do actually want to
ignore word separators. I was thinking about hacky workarounds like
removing all standalone non-alphanumeric characters, or moving them in
"should" instead of default "must" (in case we do have analyzers in future
that are whitespace).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.