Filter order inside nested bool expressions

For my requirements, I need to have a filter like below:
{
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"field1": "value1x"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"field2": "value2x"
}
}
]
}
}
]
}
}

This is a simplified example. There can be multiple values for the fields
field1 and/or field2 and hence the inner should filter. I know for a fact
that the should filter for field2 will usually match much less documents
than the should filter for field1. After reading this
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filter_order.html,
it is clear that had I used field1 and field2 directly under the same must
filter, I should have specified field2 before field1.

Ask:

  1. Does the filter ordering theory hold in general for nested filters like
    in my case? Will I gain in performance if I put the should filter for
    field2 before the should filter for field1? Let's assume that these filters
    are not cached.
  2. Does the reverse of the filter ordering theory hold for should filters -
    that filters matching most documents should appear before filters matching
    less documents in a should filter?
  3. Does this same theory hold for and/or filters just like it holds for
    must/should filters?
  4. Just for my information, for this theory to work, the filters must be
    "evaluated" in a sequence and not in parallel. Correct?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84829783-d67c-464a-9f19-f6aa5309136f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Basically, you should put at the end filters which have the heavier cost.
For example a geo filter should be computed at the very end.

Also, non cached filters should be placed at the end. For example, if you have a date range filter using « now » which is not cached.

It’s not really related to the number of documents which are matching or not the filter.

Though elasticsearch tries to optimize that behind the scene.

Note also that cache plays a really important role even if Lucene is really fast.

Hope this helps.

--
David Pilato | Technical Advocate | elasticsearch.com
david.pilato@elasticsearch.com
@dadoonet | @elasticsearchfr | @scrutmydocs

Le 21 septembre 2014 à 07:39:41, Mouzer (bittusrk@gmail.com) a écrit:

For my requirements, I need to have a filter like below:
{
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"field1": "value1x"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"field2": "value2x"
}
}
]
}
}
]
}
}

This is a simplified example. There can be multiple values for the fields field1 and/or field2 and hence the inner should filter. I know for a fact that the should filter for field2 will usually match much less documents than the should filter for field1. After reading this, it is clear that had I used field1 and field2 directly under the same must filter, I should have specified field2 before field1.

Ask:

  1. Does the filter ordering theory hold in general for nested filters like in my case? Will I gain in performance if I put the should filter for field2 before the should filter for field1? Let's assume that these filters are not cached.
  2. Does the reverse of the filter ordering theory hold for should filters - that filters matching most documents should appear before filters matching less documents in a should filter?
  3. Does this same theory hold for and/or filters just like it holds for must/should filters?
  4. Just for my information, for this theory to work, the filters must be "evaluated" in a sequence and not in parallel. Correct?
    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/84829783-d67c-464a-9f19-f6aa5309136f%40googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.541ea35e.625558ec.2bf8%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Thank you for the quick response. I get what you are saying but my question
was basically from a standpoint where all the filters I am using are
equally "heavy" and some match significantly less documents than the
others. It would be really awesome if you can answer the 4 questions I
asked, if they make sense of course.

On Sunday, September 21, 2014 3:38:02 PM UTC+5:30, David Pilato wrote:

Basically, you should put at the end filters which have the heavier cost.
For example a geo filter should be computed at the very end.

Also, non cached filters should be placed at the end. For example, if you
have a date range filter using « now » which is not cached.

It’s not really related to the number of documents which are matching or
not the filter.

Though elasticsearch tries to optimize that behind the scene.

Note also that cache plays a really important role even if Lucene is
really fast.

Hope this helps.

--
David Pilato | Technical Advocate | elasticsearch.com
http://elasticsearch.com

david....@elasticsearch.com <javascript:>
@dadoonet https://twitter.com/dadoonet | @elasticsearchfr
https://twitter.com/elasticsearchfr | @scrutmydocs
http://twitter.com/scrutmydocs
https://twitter.com/scrutmydocs

Le 21 septembre 2014 à 07:39:41, Mouzer (bitt...@gmail.com <javascript:>)
a écrit:

For my requirements, I need to have a filter like below:
{
"filter": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"term": {
"field1": "value1x"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"field2": "value2x"
}
}
]
}
}
]
}
}

This is a simplified example. There can be multiple values for the
fields field1 and/or field2 and hence the inner should filter. I know for a
fact that the should filter for field2 will usually match much less
documents than the should filter for field1. After reading this
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_filter_order.html,
it is clear that had I used field1 and field2 directly under the same must
filter, I should have specified field2 before field1.

Ask:

  1. Does the filter ordering theory hold in general for nested filters like
    in my case? Will I gain in performance if I put the should filter for
    field2 before the should filter for field1? Let's assume that these filters
    are not cached.
  2. Does the reverse of the filter ordering theory hold for should filters
  • that filters matching most documents should appear before filters
    matching less documents in a should filter?
  1. Does this same theory hold for and/or filters just like it holds for
    must/should filters?
  2. Just for my information, for this theory to work, the filters must be
    "evaluated" in a sequence and not in parallel. Correct?
    --
    You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to elasticsearc...@googlegroups.com <javascript:>.
    To view this discussion on the web visit
    https://groups.google.com/d/msgid/elasticsearch/84829783-d67c-464a-9f19-f6aa5309136f%40googlegroups.com
    https://groups.google.com/d/msgid/elasticsearch/84829783-d67c-464a-9f19-f6aa5309136f%40googlegroups.com?utm_medium=email&utm_source=footer
    .
    For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/df853d5d-29cc-4392-b909-038ce33b5a15%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.