Query string syntax for exact match against array field?


(Nikita Tovstoles) #1

suppose we have a type Recipe with field ingredients that stores a JSON
string array. a couple of Recipe docs' ingredients values may therefore be:

  1. [ "apples", "oranges]
  2. [ "apples" ]

what query would return docs whose ingredients contain solely "apples"
(thus only return #2 from the above set)? I thought query string / field
queryhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntaxwould work but using:
ingredients:"apples" also returns #1.

Thank you,

-nikita

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ad78322-803f-4f61-8428-82046dfd8bfb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #2

One thought occurred to me. Perhaps:

  1. Build the token count into your ingredients field. Here's how:
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count

2a. Pre-analyze your query arguments and remove the duplicates. For
example, ingredients:"apples" ingredients:"apples".

2b. Add the token_count to your query, Based on the ES link and on the
example in 2a, you would add ingredients.word_count:1 to your query. Now,
in your set of 2 documents, document #1 would be filtered out of the
response and only document #2 would be returned.

Hope this helps!

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/041ea39a-f1f0-43c8-a648-f06d614e71ba%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nikita Tovstoles) #3

Thanks, token_filter worked. Also, w/o modifying mapping, using script
filterhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-script-filter.htmlquery
worked (query below). However, in both cases, ES seems to treat
multi-element array and array element with multiple words as having the
same token count. ie:

  1. [ "apples", "oranges] //length=2
  2. [ "apples" ] //length=1
  3. [ "apples and oranges"] //length =3, even with filter query (why not 1)?

...not an issue in my particular use case since all ingredients are 1-word,
but I would like to understand how to address case #3 above. Write a custom
Tokenizer?

Script Filter query:

{
"query" : {
"field": {
"ingredients": "apples"
}
},
"filter": {
"script": {
"script": "doc['ingredients'].values.length == 1"
}
}
}

On Wed, Jan 15, 2014 at 8:05 AM, InquiringMind brian.from.fl@gmail.comwrote:

One thought occurred to me. Perhaps:

  1. Build the token count into your ingredients field. Here's how:
    http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#token_count

2a. Pre-analyze your query arguments and remove the duplicates. For
example, ingredients:"apples" ingredients:"apples".

2b. Add the token_count to your query, Based on the ES link and on the
example in 2a, you would add ingredients.word_count:1 to your query. Now,
in your set of 2 documents, document #1 would be filtered out of the
response and only document #2 would be returned.

Hope this helps!

Brian

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/mvBrfBa5G0Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/041ea39a-f1f0-43c8-a648-f06d614e71ba%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJwaA22aTQ5WDdU358DBJQtUN%2BcM8AOezg9bGNWvCkgx73HBNg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #4
  1. [ "apples and oranges"] //length =3, even with filter query (why not 1)?

...not an issue in my particular use case since all ingredients are
1-word, but I would like to understand how to address case #3 above.

From my experience with querying, ES slurps all values of an array into one
large set of values and then indexes them. So from a normal matching
perspective, including phrase matching, the following are equivalent:

"apples and oranges"
[ "apples and oranges" ]
[ "apples" "and" "oranges" ]

The big difference is in phrase queries with a position gap between
individual values. When a position gap is added to a field's mappings, then
a phase query for a phrase that spans across values requires a slop value
that is sufficient to cross the gap.

So, for example, consider that a position gap of 4 was added to your
ingredients field's mapping. Then:

ingredients:"apples and oranges" would match the first two but not the last
one.
ingredients:"apples and oranges"~4 would match all of them, including the
last one.

This implies to me that after the source has been analyzed and indexed, ES
loses knowledge of the multiple values and knows only of tokens and their
word positions. Therefore, I am guessing that the mvel length function
counts the number of tokens, not the number of values in the source JSON.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d95974e-9d78-46e3-a9f9-7a8b3ea3c2aa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #5

I wonder if this issuehttps://github.com/elasticsearch/elasticsearch/issues/4492,
when fixed, will help you by allowing your script to pull values from the
_source?

Please feel free to correct me if I'm wrong! Thanks!

Brian

On Wednesday, January 15, 2014 2:32:13 PM UTC-5, Nikita Tovstoles wrote:

... but I would like to understand how to address case #3 above. Write a
custom Tokenizer?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a6318fe8-21a0-4902-a95c-860ca8790bf3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6