Performance issues in large bool filter query


(Luis) #1

I am trying to obtain a list of up to two million results through the following scrolled query, of each hit I only need a couple of fields:

GET alerts-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": 1539648000,
              "lte": 1539907199,
              "format": "epoch_second"
            }
          }
        },
        {
          "term": {
            "sensor_name.keyword": "enp3s0"
          }
        },
        {
          "term": {
            "source_ip": "192.168.0.1"
          }
        },
        {
          "term": {
            "destination_ip": "192.168.0.2"
          }
        },
        {
          "term": {
            "destination_port": "135"
          }
        },
        {
          "term": {
            "sid": 2102251
          }
        }
      ]
    }
  },
  "size": 10000,
  "_source": [
    "source_port",
    "timestamp"
  ]
}

Each scroll takes about 2 seconds initially, and takes a little more each time, so by the end of the query each scroll can take up to 7 seconds. Since this query can take some minutes to complete, I would like to know if there is a more efficient way to get this large list of values. I am using ES 6.3 and python to store the returned values. Thank you!


(Martijn Van Groningen) #2

Normal pagination (from parameter) should not be used when fetching many documents. Instead search after should be used. This works differently. Instead of specifying an offset in subsequent search requests; the last sort value should be specified in subsequent requests.

Search after: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/search-request-search-after.html


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.