Filter query with bucketing performance optimization help needed


(Sergey Podgurskiy) #1

Hello.

We are running a ES cluster with 4 nodes, 2 shards, 3 indexes. Each index contains about 60Gb of data.
I need to run a query with a complex filter but empty query part (scoring is not needed at all).

The idea behind this query is to filter out documents that are not relevant. To do that we want to apply different rules to the 'content' field. These rules depend on the length of the content. So documents with more content should have more occurrences than documents with a fewer content.

The problem is that these queries execute pretty slow.
I'm wondering is there any performance tips or known limitations for range queries or slop?
Or do you guys see any other problems with this query?

Is there any way to determine what ES is doing to actually run the query.
How can I find the speed bottleneck?

A few more things to notice:
1 - this is an example of 1 entry, but we typically have 5-10 blocks that looks like this all together
2 - When we have 5-10 blocks, queries can take between 3-5 seconds to execute
3 - in looking at explain, the range filters appear to be cached
4 - Running the queries multiple times do not seem to improve speed at all
5 - no extra IO load or CPU load appears in our Marvel metrics.


(Sergey Podgurskiy) #2

Here is an example of the query

{
 "query": {
  "filtered": {
   "filter": {
    "bool": {
     "must": [
      {
       "bool": {
        "should": [
         {
          "query": {
           "match": {
            "title": {
             "query": "fruit",
             "type": "phrase"
            }
           }
          }
         },
         {
          "bool": {
           "must": [
            {
             "query": {
              "match": {
               "content": {
                "query": "fruit",
                "type": "phrase"
               }
              }
             }
            },
            {
             "bool": {
              "should": [
               {
                "bool": {
                 "must": [
                  {
                   "range": {
                    "contentSize": {
                     "from": 0,
                     "to": 400,
                     "include_lower": false,
                     "include_upper": true
                    }
                   }
                  },
                  {
                   "query": {
                    "match": {
                     "content": {
                      "query": "fruit",
                      "type": "phrase",
                      "slop": 1000
                     }
                    }
                   }
                  }
                 ]
                }
               },
               {
                "bool": {
                 "must": [
                  {
                   "range": {
                    "contentSize": {
                     "from": 400,
                     "to": 1500,
                     "include_lower": false,
                     "include_upper": true
                    }
                   }
                  },
                  {
                   "query": {
                    "match": {
                     "content": {
                      "query": "fruit fruit",
                      "type": "phrase",
                      "slop": 1500
                     }
                    }
                   }
                  }
                 ]
                }
               },
               {
                "bool": {
                 "must": [
                  {
                   "range": {
                    "contentSize": {
                     "from": 1500,
                     "to": null,
                     "include_lower": false,
                     "include_upper": true
                    }
                   }
                  },
                  {
                   "query": {
                    "match": {
                     "content": {
                      "query": "fruit fruit fruit",
                      "type": "phrase",
                      "slop": 10000
                     }
                    }
                   }
                  }
                 ]
                }
               }
              ]
             }
            }
           ]
          }
         }
        ]
       }
      }
     ]
    }
   }
  }
 }
}

(ddorian43) #3

See profile-api https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html (i don't understand it yet myself).

Can you replace "match" with "term" filter ? assuming it's analyzed it ~should be the same ?


(Sergey Podgurskiy) #4

Thank you for your response.
I forgot to mention we are using ES 1.3.1 which doesn't have profile-api yet.

Unfortunately, I can't replace "match" with "term" because I can have a phrase or keyword there.


(ddorian43) #5

Can you make it a term filter in case when it's only 1 word ? At least this should speed up only those cases.

What about creating another installation of 2.2 and use that just to get the profile information ? Assuming it hasn't changed alot ?


(Sergey Podgurskiy) #6

Thank you, ddorian.
But queries structure in 2.2 and 1.3 are really different
I don't think that 2.2 profile info can help me with my 1.3.

Switching to "term" filter from "match" is not a case for me either.
We don't want to run the analyzer in our ES client application for queries.


(Sergey Podgurskiy) #7

I didn't want to sound like your suggestions are not useful.

Unfortunately, there is no way for us to upgrade to 2.2 in the near future
and the profile would be telling us the profile in 2.2, not 1.3 query structures.
Hoping someone has an idea as to why these filters would be slow in 1.3
or, if there was any way to backport the profile to 1.3


(system) #8