Query optimization!

Mr.Flakes · August 4, 2017, 3:01pm

Hello elastic gurus, I am pretty new to the whole stack and some help would be much appreciated. So I have this query running pretty often and it eats A LOT of resources. It works as intended results wise but I just want to know if you would have any advice how to make it any better!

GET cdr/_search
{
  "query":{
    "constant_score":{
      "filter":{
        "bool":{
          "should":[
            {
              "range":{
                "startTime":{
                  "gte":"2017-01-21T23:00+02:00",
                  "lte":"2017-01-21T23:50+02:00"
                }
              }
            },
            {
              "range":{
                "stopTime":{
                  "gte":"2017-01-21T23:00+02:00",
                  "lte":"2017-01-21T23:50+02:00"
                }
              }
            },
            {
              "bool":{
                "must":[
                  {
                    "range":{
                      "startTime":{
                        "lte":"2017-01-21T23:00+02:00"
                      }
                    }
                  },
                  {
                    "range":{
                      "stopTime":{
                        "gte":"2017-01-21T23:50+02:00"
                      }
                    }
                  }
                ]
              }
            }
          ],
          "must":[
            {
              "term":{
                "sourceIp":{
                  "value":"XX.XX.X.XX"
                }
              }
            }
          ]
        }
      }
    }
  }
}

Sorry for the bad formatting. Anyway, so the first part of the query is pretty simple and is used to get all the documents in the provided date range and the second bool is used to get all the documents that crosses the provided range but is not exactly in it (for example documents that have 2017-01-21T22:59+02:00 as startTime and 2017-01-21T23:59+02:00 as stopTime). The last step is to get only those documents that are associated with particular client (IP address). The data types are startTime DATE, stopTime DATE, sourceIp KEYWORD. Any ideas how I could make this better? The total data size on disk is around 50 TB if that plays any role.

jpountz · August 11, 2017, 7:50am

I could be wrong but it seems to me that the first two range queries are not necessary since they are covered by the inner bool query?

This will require reindexing but I suspect you might want to look into indexing your startTime and stopTime fields as a single date range field rather than two separate date fields. Then you can directly query the range field to find the intersection with a single range query, and it is usually faster than using two range queries. https://www.elastic.co/guide/en/elasticsearch/reference/current/range.html

Mr.Flakes · August 11, 2017, 10:00am

Nah, the inner bool query only covers records that are less than startTime and more than stopTime so it does not take into account the records that for example are inside the given range, it takes only records that are outside the given range but crosses it. I actually was thinking about the date_range type but I kinda fear reindexing because I have around 10 TB of data running on one node I guess that could take some time... but I have no choice I guess. Thanks for help.

jpountz · August 11, 2017, 10:28am

I had misread your query indeed. But then could you do range matching with just this query:

            {
              "bool":{
                "must":[
                  {
                    "range":{
                      "startTime":{
                        "lte":"2017-01-21T23:50+02:00" // query stop time
                      }
                    }
                  },
                  {
                    "range":{
                      "stopTime":{
                        "gte":"2017-01-21T23:00+02:00" // query start time
                      }
                    }
                  }
                ]
              }
            }

It looks similar to your inner bool query except that I swapped the start / stop bounds. If all your indexed documents have a stop time that is gte the start time, then this should match all intersecting ranges?

system · September 8, 2017, 10:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question on ES query optimization Elastic Search	3	42	October 24, 2024
Simple Query (1 cardinality, 19,000 docs) takes ~100k ms to complete Elasticsearch	4	312	July 21, 2021
Filter query with bucketing performance optimization help needed Elasticsearch	7	1136	July 5, 2017
Optimize bool filter query Elasticsearch	1	415	December 11, 2020
API Requesting all documents in a certain range Elasticsearch	5	587	February 10, 2018

Query optimization!

Related topics