This breaks the field into tokens separated by any non-letter or
non-numeric character.
But the user is searching for "foo-bar" which contains a non alphanumeric
character. I assume, but correct me if I'm wrong, that ES will apply the
same analyzer to that string. So it is broken into two tokens: ["foo",
"bar"], and then the default_operator kicks in and essentially turns the
query into "details:foo AND detail:bar".
My problem is that it will match documents containing "foo xyz bar" and
"bar xyz foo" -- in the latter case, the tokens are in the reverse order
from the user's search. I'm fine with it matching the former, but it's a
stretch to convince the user that the latter is intended.
The search string is provided by the user, so I can't really build a
complex query with different query types, hence the basic querystring
search.
Any advice or corrections to my assumptions is appreciated!
You analysis of what is going on sounds correct. However, Elasticsearch's
results are also correct. When it analyzes the search string, your query
becomes a match query on "foo" AND "bar", which matches any document
containing both of those terms. Most queries against analyzed fields do not
respect the original ordering of the terms.
One thing you could try is looking into the match_phrase query ( Phrase Matching | Elasticsearch: The Definitive Guide [master] | Elastic)
which is aware of the ordering of the terms. Using the base match_phrase
query for "foo bar" will not match either "foo xyz bar" or "bar xyz foo".
If you still need to match things like "foo xyz bar" you may be able to do
that using the slop parameter, depending on what exactly the use case is.
This breaks the field into tokens separated by any non-letter or
non-numeric character.
But the user is searching for "foo-bar" which contains a non alphanumeric
character. I assume, but correct me if I'm wrong, that ES will apply the
same analyzer to that string. So it is broken into two tokens: ["foo",
"bar"], and then the default_operator kicks in and essentially turns the
query into "details:foo AND detail:bar".
My problem is that it will match documents containing "foo xyz bar" and
"bar xyz foo" -- in the latter case, the tokens are in the reverse order
from the user's search. I'm fine with it matching the former, but it's a
stretch to convince the user that the latter is intended.
The search string is provided by the user, so I can't really build a
complex query with different query types, hence the basic querystring
search.
Any advice or corrections to my assumptions is appreciated!
Thanks, though unless I am misunderstanding it, the docs imply otherwise:
For example, from:
The query string is parsed into a series of terms and operators. A term
can be a single word — quick or brown — or a phrase, surrounded by double
quotes — "quick brown" — which searches for all the words in the phrase,
in the same order.
So what gives?
On Tuesday, April 14, 2015 at 1:15:24 PM UTC-7, James Macdonald wrote:
You analysis of what is going on sounds correct. However, Elasticsearch's
results are also correct. When it analyzes the search string, your query
becomes a match query on "foo" AND "bar", which matches any document
containing both of those terms. Most queries against analyzed fields do not
respect the original ordering of the terms.
One thing you could try is looking into the match_phrase query ( Phrase Matching | Elasticsearch: The Definitive Guide [master] | Elastic)
which is aware of the ordering of the terms. Using the base match_phrase
query for "foo bar" will not match either "foo xyz bar" or "bar xyz foo".
If you still need to match things like "foo xyz bar" you may be able to do
that using the slop parameter, depending on what exactly the use case is.
James
On Tue, Apr 14, 2015 at 2:03 PM, Dave Reed <infin...@gmail.com
<javascript:>> wrote:
This breaks the field into tokens separated by any non-letter or
non-numeric character.
But the user is searching for "foo-bar" which contains a non alphanumeric
character. I assume, but correct me if I'm wrong, that ES will apply the
same analyzer to that string. So it is broken into two tokens: ["foo",
"bar"], and then the default_operator kicks in and essentially turns the
query into "details:foo AND detail:bar".
My problem is that it will match documents containing "foo xyz bar" and
"bar xyz foo" -- in the latter case, the tokens are in the reverse order
from the user's search. I'm fine with it matching the former, but it's a
stretch to convince the user that the latter is intended.
The search string is provided by the user, so I can't really build a
complex query with different query types, hence the basic querystring
search.
Any advice or corrections to my assumptions is appreciated!
To perhaps answer my own question, I think I understand the difference.
details:"foo bar"
Would search for the tokens in the same order (implied by the docs I
referenced). But
details:foo-bar
Would not honor the order. The quotes have more meaning than to enclose the
phrase... if that is true then these two queries are not the same, which is
different than I thought:
details:foo\ bar
!=
details:"foo bar"
Or am I barking up the wrong tree...
On Tuesday, April 14, 2015 at 1:34:28 PM UTC-7, Dave Reed wrote:
Thanks, though unless I am misunderstanding it, the docs imply otherwise:
The query string is parsed into a series of terms and operators. A
term can be a single word — quick or brown — or a phrase, surrounded by
double quotes — "quick brown" — which searches for all the words in the
phrase, in the same order.
To perhaps answer my own question, I think I understand the difference.
details:"foo bar"
Would search for the tokens in the same order (implied by the docs I
referenced). But
details:foo-bar
Would not honor the order. The quotes have more meaning than to enclose
the phrase... if that is true then these two queries are not the same,
which is different than I thought:
details:foo\ bar
!=
details:"foo bar"
Or am I barking up the wrong tree...
On Tuesday, April 14, 2015 at 1:34:28 PM UTC-7, Dave Reed wrote:
Thanks, though unless I am misunderstanding it, the docs imply otherwise:
The query string is parsed into a series of terms and operators. A
term can be a single word — quick or brown — or a phrase, surrounded by
double quotes — "quick brown" — which searches for all the words in the
phrase, in the same order.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.