Query_string search containing a dash has unexpected results

I have a document with a field "message", that contains the following text
(truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
"query": {

"query_string": {
  "query": "id:3955974 AND message:welcome-doesnotmatchanything"
}

}
}

To my surprise, it finds the document (3955974 is the document id). The
dash and everything after it seems to be ignored, because it does not
matter what I put there, it will still match the document.

I've tried escaping it:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:welcome\-doesnotmatchanything"
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put it
in quotes it works:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:"welcome-doesnotmatchanything""
}
}
}

It works, meaning it matches 0 documents, since that document does not
contain the "doesnotmatchanything" token. That's great, but I don't
understand why the unquoted version does not work. This query is being
generated so I can't easily just decide to start quoting it, and I can't
always do that anyway since the user is sometimes going to use wildcards,
which can't be quoted if I want them to function. I was under the
assumption that an EscapedUnquotedString is the same as a quoted unespaced
string (in other words, foo:a\b\c === foo:"abc", assuming all special
characters are escaped in the unquoted version).

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Dave,

I think the reason is your "message" field using "standard analyzer".
Standard analyzer divide text by "-".
If you change analyzer to whitespace analyzer, it matches 0 documents.

_validate API is useful for checking exact query.
Example request:

curl -XGET "/YOUR_INDEX/_validate/query?explain" -d'
{
"query": {
"query_string": {
"query": "id:3955974 AND message:welcome-doesnotmatchanything"
}
}
}'

You can get the following response. In this example, "message" field is
"index": "not_analyzed".
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "YOUR_INDEX,
"valid": true,
"explanation": "+id:3955974 +message:welcome-doesnotmatchanything"
}
]
}

See:

I hope that those help you out.

Regards,
Jun

2014-11-07 9:47 GMT+09:00 Dave Reed infinity88@gmail.com:

I have a document with a field "message", that contains the following text
(truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
"query": {

"query_string": {
  "query": "id:3955974 AND message:welcome-doesnotmatchanything"
}

}
}

To my surprise, it finds the document (3955974 is the document id). The
dash and everything after it seems to be ignored, because it does not
matter what I put there, it will still match the document.

I've tried escaping it:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:welcome\-doesnotmatchanything"
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put it
in quotes it works:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:"welcome-doesnotmatchanything""
}
}
}

It works, meaning it matches 0 documents, since that document does not
contain the "doesnotmatchanything" token. That's great, but I don't
understand why the unquoted version does not work. This query is being
generated so I can't easily just decide to start quoting it, and I can't
always do that anyway since the user is sometimes going to use wildcards,
which can't be quoted if I want them to function. I was under the
assumption that an EscapedUnquotedString is the same as a quoted unespaced
string (in other words, foo:a\b\c === foo:"abc", assuming all special
characters are escaped in the unquoted version).

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :slight_smile:

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--

Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPW8A5zFTiEcT%3D0m%3D-N0ApbfAUBqgMp2hjvmGSJaL1ByLMAAvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I'm not using the standard analyzer, I'm using a pattern that will break
the text on all non-word characters, like this:

"analyzer": {
"letterordigit": {
"type": "pattern",
"pattern": "[^\p{L}\p{N}]+"
}
}

I have verified that the message field is being broke up into the tokens I
expect (example in my first post).

So when I run a search for message:welcome-doesnotmatch, I'm expecting that
string to be broken into tokens like so:

welcome
doesnotmatch

And for the search to therefore find 0 documents. But it doesn't -- it
finds 1 document, the document that contains my sample message, which does
not include the token "doesnotmatch".

So why on Earth would this search match that document? It is behaving as if
everything after the "-" is completely ignored. It does not matter what I
put there, it will still match the document.

This is coming up because an end user is searching for a hyphenated word,
like "battle-axe", and it's matching a document that does not contain the
word "axe" at all.

On Friday, November 7, 2014 12:24:30 AM UTC-8, Jun Ohtani wrote:

Hi Dave,

I think the reason is your "message" field using "standard analyzer".
Standard analyzer divide text by "-".
If you change analyzer to whitespace analyzer, it matches 0 documents.

_validate API is useful for checking exact query.
Example request:

curl -XGET "/YOUR_INDEX/_validate/query?explain" -d'
{
"query": {
"query_string": {
"query": "id:3955974 AND message:welcome-doesnotmatchanything"
}
}
}'

You can get the following response. In this example, "message" field is
"index": "not_analyzed".
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "YOUR_INDEX,
"valid": true,
"explanation": "+id:3955974 +message:welcome-doesnotmatchanything"
}
]
}

See:
Elasticsearch Platform — Find real-time answers at scale | Elastic

I hope that those help you out.

Regards,
Jun

2014-11-07 9:47 GMT+09:00 Dave Reed <infin...@gmail.com <javascript:>>:

I have a document with a field "message", that contains the following
text (truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
"query": {

"query_string": {
  "query": "id:3955974 AND message:welcome-doesnotmatchanything"
}

}
}

To my surprise, it finds the document (3955974 is the document id). The
dash and everything after it seems to be ignored, because it does not
matter what I put there, it will still match the document.

I've tried escaping it:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:welcome\-doesnotmatchanything"
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put
it in quotes it works:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:"welcome-doesnotmatchanything""
}
}
}

It works, meaning it matches 0 documents, since that document does not
contain the "doesnotmatchanything" token. That's great, but I don't
understand why the unquoted version does not work. This query is being
generated so I can't easily just decide to start quoting it, and I can't
always do that anyway since the user is sometimes going to use wildcards,
which can't be quoted if I want them to function. I was under the
assumption that an EscapedUnquotedString is the same as a quoted unespaced
string (in other words, foo:a\b\c === foo:"abc", assuming all special
characters are escaped in the unquoted version).

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :slight_smile:

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/1dbfa1d5-7301-460b-ae9c-3665cfa79c96%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--

Jun Ohtani
blog : http://blog.johtani.info

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/26a1cf96-b89b-4729-a2b1-58ba79c425a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you run the validate query output. That will be helpful.
amish

On Thursday, November 6, 2014 4:47:12 PM UTC-8, Dave Reed wrote:

I have a document with a field "message", that contains the following text
(truncated):

Welcome to test.com!

The assertion field is mapped to have an analyzer that breaks that string
into the following tokens:

welcome
to
test
com

But, when I search with a query like this:

{
"query": {

"query_string": {
  "query": "id:3955974 AND message:welcome-doesnotmatchanything"
}

}
}

To my surprise, it finds the document (3955974 is the document id). The
dash and everything after it seems to be ignored, because it does not
matter what I put there, it will still match the document.

I've tried escaping it:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:welcome\-doesnotmatchanything"
}
}
}
(note the double escape since it has to be escaped for the JSON too)

But that makes no difference. I still get 1 matching document. If I put it
in quotes it works:

{
"query": {
"query_string": {
"query": "id:3955974 AND message:"welcome-doesnotmatchanything""
}
}
}

It works, meaning it matches 0 documents, since that document does not
contain the "doesnotmatchanything" token. That's great, but I don't
understand why the unquoted version does not work. This query is being
generated so I can't easily just decide to start quoting it, and I can't
always do that anyway since the user is sometimes going to use wildcards,
which can't be quoted if I want them to function. I was under the
assumption that an EscapedUnquotedString is the same as a quoted unespaced
string (in other words, foo:a\b\c === foo:"abc", assuming all special
characters are escaped in the unquoted version).

I'm only on ES 1.01, but I don't see anything new or changes that would
have impacted this behavior in later versions.

Any insights would be helpful! :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7790c6fc-5578-4434-9bd2-fd846e59a997%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yes of course :slight_smile: Here we go:

{

  • valid: true
  • _shards: {
    • total: 1
    • successful: 1
    • failed: 0
      }
  • explanations: [
    • {
      • index: index_v1
      • valid: true
      • explanation: message:welcome message:doesnotmatch
        }
        ]

}

It pasted a little weird but that's it.

On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

Can you run the validate query output. That will be helpful.
amish

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/83422fed-2e1c-4e27-825e-5bd9f334f85a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Also interesting... if I run the query with explain=true, I see information
in the details about the "welcome" token, but there's no mention at all
about the "doesnotmatch" token. I guess it wouldn't mention it though,
since if it did, the document shouldn't match in the first place.

On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:

Yes of course :slight_smile: Here we go:

{

  • valid: true
  • _shards: {
    • total: 1
    • successful: 1
    • failed: 0
      }
  • explanations: [
    • {
      • index: index_v1
      • valid: true
      • explanation: message:welcome message:doesnotmatch
        }
        ]

}

It pasted a little weird but that's it.

On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

Can you run the validate query output. That will be helpful.
amish

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/632d1e74-31a0-42f2-ad09-40e3030449d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I created a test index using your pattern and I am seeing the appropriate
behaviour.
I am assuming you are using the same analyzer for search/query as well as
ensuring that your DEFAULT OPERATOR is AND.
Note that using the welcome-doesnotmatchanything analzyzer will break into
two tokens with OR and your document will match unless you use AND.
amish

On Monday, November 10, 2014 2:48:06 PM UTC-8, Dave Reed wrote:

Also interesting... if I run the query with explain=true, I see
information in the details about the "welcome" token, but there's no
mention at all about the "doesnotmatch" token. I guess it wouldn't mention
it though, since if it did, the document shouldn't match in the first place.

On Monday, November 10, 2014 2:45:05 PM UTC-8, Dave Reed wrote:

Yes of course :slight_smile: Here we go:

{

  • valid: true
  • _shards: {
    • total: 1
    • successful: 1
    • failed: 0
      }
  • explanations: [
    • {
      • index: index_v1
      • valid: true
      • explanation: message:welcome message:doesnotmatch
        }
        ]

}

It pasted a little weird but that's it.

On Monday, November 10, 2014 2:25:33 PM UTC-8, Amish Asthana wrote:

Can you run the validate query output. That will be helpful.
amish

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6f17d388-83c9-4d75-8f6f-8af3b4dc954b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

My default operator doesn't matter if I understand it correctly, because
I'm specifying the operate explicitly. Also, I can reproduce this behavior
using a single search term, so there's no operator to speak of. Unless
you're saying that the default operator applies to a single term query if
it is broken into tokens?

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by
the end user. You're saying I should on the app side break the string they
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:

  1. The user may use a trailing wildcard, e.g. foo*
  2. The user may enter multiple terms separated by a space. Only documents
    containing all of the terms will match.
  3. The user might enter special characters, such as in "battle-axe", simply
    because that is what they think they should search for, which should match
    documents containing "battle" and "axe" (the same as a search for "battle
    axe").

To that end, I am taking their search string and forming a search like this:

message: AND...

Where the string is split on spaces and joined with the AND clauses. For
each individual part of the search phrase, I take care of escaping special
characters (except "*" since I am allowing them to use wildcards). For
example, if they entered "foo bar!", I would generate this query:

message:foo AND message:bar!

The problem is they are entering "battle-axe", causing me to generate this:

message:battle-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I have
to know from my app point of view what tokens I should be splitting the
original string on, so that I can join them back together with ANDs. But
that means basically reimplementing the tokenizer on my end, does it not?
There must be a better way? Like specifying I want those terms to be joined
with ANDs instead?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/924a04d5-4163-41b5-a7e7-e3ca2982d078%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ok... specifying default_operator: AND worked!!!!

In that case, I'd like to say that the docs on that option are incomplete
or confusing. It says:

The default operator used if no explicit operator is specified. For example,
with a default operator of OR, the query capital of Hungary is translated
to capital OR of OR Hungary, and with default operator of AND, the same
query is translated to capital AND of AND Hungary. The default value is OR.

That's all well and good, but my query does not have multiple terms like
that. I have a single term for a single field. The default operator is
applying to the resulting tokens of that, after they are generated by the
analyzer. I assumed that the default operator applied at the level of the
query being parsed and that had nothing at all to do with the analyzer.
Making that clearer could have saved me a lot of time :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1a058ca-b179-495a-8b82-e65fece4f99f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No I am not saying that . I am saying this :
GET my_index_v1/mytype/_search
{
"query": {
"query_string": {
"default_field": "name",
"query": "welcome-doesnotmatchanything",
"default_operator": "AND"
}
}
}

Here I will not get a match as expected. If I do not specify then OR is the
deafult operator and it will match.
amish

On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

My default operator doesn't matter if I understand it correctly, because
I'm specifying the operate explicitly. Also, I can reproduce this behavior
using a single search term, so there's no operator to speak of. Unless
you're saying that the default operator applies to a single term query if
it is broken into tokens?

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by
the end user. You're saying I should on the app side break the string they
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:

  1. The user may use a trailing wildcard, e.g. foo*
  2. The user may enter multiple terms separated by a space. Only documents
    containing all of the terms will match.
  3. The user might enter special characters, such as in "battle-axe",
    simply because that is what they think they should search for, which should
    match documents containing "battle" and "axe" (the same as a search for
    "battle axe").

To that end, I am taking their search string and forming a search like
this:

message: AND...

Where the string is split on spaces and joined with the AND clauses. For
each individual part of the search phrase, I take care of escaping special
characters (except "*" since I am allowing them to use wildcards). For
example, if they entered "foo bar!", I would generate this query:

message:foo AND message:bar!

The problem is they are entering "battle-axe", causing me to generate this:

message:battle-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I have
to know from my app point of view what tokens I should be splitting the
original string on, so that I can join them back together with ANDs. But
that means basically reimplementing the tokenizer on my end, does it not?
There must be a better way? Like specifying I want those terms to be joined
with ANDs instead?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b20d4b80-2ebd-4b5c-a1e5-a434c2d68598%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yes, and this was the key, thank you so much. But see my reply above about
the docs on that param being confusing. That was really the source of the
problem for me.

On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:

No I am not saying that . I am saying this :
GET my_index_v1/mytype/_search
{
"query": {
"query_string": {
"default_field": "name",
"query": "welcome-doesnotmatchanything",
"default_operator": "AND"
}
}
}

Here I will not get a match as expected. If I do not specify then OR is
the deafult operator and it will match.
amish

On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

My default operator doesn't matter if I understand it correctly, because
I'm specifying the operate explicitly. Also, I can reproduce this behavior
using a single search term, so there's no operator to speak of. Unless
you're saying that the default operator applies to a single term query if
it is broken into tokens?

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by
the end user. You're saying I should on the app side break the string they
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:

  1. The user may use a trailing wildcard, e.g. foo*
  2. The user may enter multiple terms separated by a space. Only documents
    containing all of the terms will match.
  3. The user might enter special characters, such as in "battle-axe",
    simply because that is what they think they should search for, which should
    match documents containing "battle" and "axe" (the same as a search for
    "battle axe").

To that end, I am taking their search string and forming a search like
this:

message: AND...

Where the string is split on spaces and joined with the AND clauses. For
each individual part of the search phrase, I take care of escaping special
characters (except "*" since I am allowing them to use wildcards). For
example, if they entered "foo bar!", I would generate this query:

message:foo AND message:bar!

The problem is they are entering "battle-axe", causing me to generate
this:

message:battle-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I
have to know from my app point of view what tokens I should be splitting
the original string on, so that I can join them back together with ANDs.
But that means basically reimplementing the tokenizer on my end, does it
not? There must be a better way? Like specifying I want those terms to be
joined with ANDs instead?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

If you want to translate battle-axe into "battle axe", note that the
correct method would be to introduce a phrase search with slop 0. The and
operator may also work in most cases but the word positions will be lost,
you get an more unprecise search for docs that contain "battle" and "axe"
anywhere in the field.

Jörg

On Tue, Nov 11, 2014 at 1:27 AM, Dave Reed infinity88@gmail.com wrote:

Yes, and this was the key, thank you so much. But see my reply above about
the docs on that param being confusing. That was really the source of the
problem for me.

On Monday, November 10, 2014 4:15:05 PM UTC-8, Amish Asthana wrote:

No I am not saying that . I am saying this :
GET my_index_v1/mytype/_search
{
"query": {
"query_string": {
"default_field": "name",
"query": "welcome-doesnotmatchanything",
"default_operator": "AND"
}
}
}

Here I will not get a match as expected. If I do not specify then OR is
the deafult operator and it will match.
amish

On Monday, November 10, 2014 4:01:14 PM UTC-8, Dave Reed wrote:

My default operator doesn't matter if I understand it correctly, because
I'm specifying the operate explicitly. Also, I can reproduce this behavior
using a single search term, so there's no operator to speak of. Unless
you're saying that the default operator applies to a single term query if
it is broken into tokens?

Note that using the welcome-doesnotmatchanything analzyzer will break
into two tokens with OR and your document will match unless you use AND

This concerns me... my search looks like:

message:welcome-doesnotmatchanything

I cannot break that into an AND. The entire thing is a value provided by
the end user. You're saying I should on the app side break the string they
entered into tokens and join them with ANDs? That doesn't seem viable...

Let me back up and say what I'm expecting the user to be able to do.
There's a single text box where they can enter a search query, with the
following rules:

  1. The user may use a trailing wildcard, e.g. foo*
  2. The user may enter multiple terms separated by a space. Only
    documents containing all of the terms will match.
  3. The user might enter special characters, such as in "battle-axe",
    simply because that is what they think they should search for, which should
    match documents containing "battle" and "axe" (the same as a search for
    "battle axe").

To that end, I am taking their search string and forming a search like
this:

message: AND...

Where the string is split on spaces and joined with the AND clauses. For
each individual part of the search phrase, I take care of escaping special
characters (except "*" since I am allowing them to use wildcards). For
example, if they entered "foo bar!", I would generate this query:

message:foo AND message:bar!

The problem is they are entering "battle-axe", causing me to generate
this:

message:battle-axe

But that ends up being the same as:

(message:battle OR message:axe)

I guess that is what I was not expecting. Because of this behavior, I
have to know from my app point of view what tokens I should be splitting
the original string on, so that I can join them back together with ANDs.
But that means basically reimplementing the tokenizer on my end, does it
not? There must be a better way? Like specifying I want those terms to be
joined with ANDs instead?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/4d64842d-6374-465d-b261-452d845a3985%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEwS3ZGs540HcpBipfa__Q8fjPRVkrrHCt0KXJpKn3a2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.