Performance issues in large bool filter query

I am trying to obtain a list of up to two million results through the following scrolled query, of each hit I only need a couple of fields:

GET alerts-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "timestamp": {
              "gte": 1539648000,
              "lte": 1539907199,
              "format": "epoch_second"
            }
          }
        },
        {
          "term": {
            "sensor_name.keyword": "enp3s0"
          }
        },
        {
          "term": {
            "source_ip": "192.168.0.1"
          }
        },
        {
          "term": {
            "destination_ip": "192.168.0.2"
          }
        },
        {
          "term": {
            "destination_port": "135"
          }
        },
        {
          "term": {
            "sid": 2102251
          }
        }
      ]
    }
  },
  "size": 10000,
  "_source": [
    "source_port",
    "timestamp"
  ]
}

Each scroll takes about 2 seconds initially, and takes a little more each time, so by the end of the query each scroll can take up to 7 seconds. Since this query can take some minutes to complete, I would like to know if there is a more efficient way to get this large list of values. I am using ES 6.3 and python to store the returned values. Thank you!

2 Likes

Normal pagination (from parameter) should not be used when fetching many documents. Instead search after should be used. This works differently. Instead of specifying an offset in subsequent search requests; the last sort value should be specified in subsequent requests.

Search after: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/search-request-search-after.html

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.