Filter query with bucketing performance optimization help needed

spodgurskiy · March 1, 2016, 11:49pm

Hello.

We are running a ES cluster with 4 nodes, 2 shards, 3 indexes. Each index contains about 60Gb of data.
I need to run a query with a complex filter but empty query part (scoring is not needed at all).

The idea behind this query is to filter out documents that are not relevant. To do that we want to apply different rules to the 'content' field. These rules depend on the length of the content. So documents with more content should have more occurrences than documents with a fewer content.

The problem is that these queries execute pretty slow.
I'm wondering is there any performance tips or known limitations for range queries or slop?
Or do you guys see any other problems with this query?

Is there any way to determine what ES is doing to actually run the query.
How can I find the speed bottleneck?

A few more things to notice:
1 - this is an example of 1 entry, but we typically have 5-10 blocks that looks like this all together
2 - When we have 5-10 blocks, queries can take between 3-5 seconds to execute
3 - in looking at explain, the range filters appear to be cached
4 - Running the queries multiple times do not seem to improve speed at all
5 - no extra IO load or CPU load appears in our Marvel metrics.

spodgurskiy · March 1, 2016, 11:54pm

Here is an example of the query

{
 "query": {
  "filtered": {
   "filter": {
    "bool": {
     "must": [
      {
       "bool": {
        "should": [
         {
          "query": {
           "match": {
            "title": {
             "query": "fruit",
             "type": "phrase"
            }
           }
          }
         },
         {
          "bool": {
           "must": [
            {
             "query": {
              "match": {
               "content": {
                "query": "fruit",
                "type": "phrase"
               }
              }
             }
            },
            {
             "bool": {
              "should": [
               {
                "bool": {
                 "must": [
                  {
                   "range": {
                    "contentSize": {
                     "from": 0,
                     "to": 400,
                     "include_lower": false,
                     "include_upper": true
                    }
                   }
                  },
                  {
                   "query": {
                    "match": {
                     "content": {
                      "query": "fruit",
                      "type": "phrase",
                      "slop": 1000
                     }
                    }
                   }
                  }
                 ]
                }
               },
               {
                "bool": {
                 "must": [
                  {
                   "range": {
                    "contentSize": {
                     "from": 400,
                     "to": 1500,
                     "include_lower": false,
                     "include_upper": true
                    }
                   }
                  },
                  {
                   "query": {
                    "match": {
                     "content": {
                      "query": "fruit fruit",
                      "type": "phrase",
                      "slop": 1500
                     }
                    }
                   }
                  }
                 ]
                }
               },
               {
                "bool": {
                 "must": [
                  {
                   "range": {
                    "contentSize": {
                     "from": 1500,
                     "to": null,
                     "include_lower": false,
                     "include_upper": true
                    }
                   }
                  },
                  {
                   "query": {
                    "match": {
                     "content": {
                      "query": "fruit fruit fruit",
                      "type": "phrase",
                      "slop": 10000
                     }
                    }
                   }
                  }
                 ]
                }
               }
              ]
             }
            }
           ]
          }
         }
        ]
       }
      }
     ]
    }
   }
  }
 }
}

ddorian43 · March 2, 2016, 1:07am

See profile-api https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html (i don't understand it yet myself).

Can you replace "match" with "term" filter ? assuming it's analyzed it ~should be the same ?

spodgurskiy · March 2, 2016, 3:36am

Thank you for your response.
I forgot to mention we are using ES 1.3.1 which doesn't have profile-api yet.

Unfortunately, I can't replace "match" with "term" because I can have a phrase or keyword there.

ddorian43 · March 2, 2016, 10:27am

Can you make it a term filter in case when it's only 1 word ? At least this should speed up only those cases.

What about creating another installation of 2.2 and use that just to get the profile information ? Assuming it hasn't changed alot ?

spodgurskiy · March 3, 2016, 12:33am

Thank you, ddorian.
But queries structure in 2.2 and 1.3 are really different
I don't think that 2.2 profile info can help me with my 1.3.

Switching to "term" filter from "match" is not a case for me either.
We don't want to run the analyzer in our ES client application for queries.

spodgurskiy · March 3, 2016, 6:51pm

I didn't want to sound like your suggestions are not useful.

Unfortunately, there is no way for us to upgrade to 2.2 in the near future
and the profile would be telling us the profile in 2.2, not 1.3 query structures.
Hoping someone has an idea as to why these filters would be slow in 1.3
or, if there was any way to backport the profile to 1.3

Topic		Replies	Views
Further optimization to ES queries / performance Elasticsearch	1	343	September 3, 2020
Query Performance Elasticsearch	11	1828	July 6, 2017
Range Filter slower then no range query (full scan) Elasticsearch	4	1420	July 6, 2017
Elasticsearch query performance using filter query Elasticsearch	4	4021	December 29, 2017
Bool filter performance and its alternatives Elasticsearch	1	708	September 10, 2019

Filter query with bucketing performance optimization help needed

Related topics