ES query slows down in high concurrency

  1. We now have a large number of NetFlow records that need to be queried, with a data volume of 200,000 records/s. The query logic is to go to es to associate users based on time, aIp, start and end ports, and bIp. The specific query logic is at the end. The current problem is that the query request performance of 200,000 records/s is not enough, resulting in a large amount of Netflow data backlog, and the snowball is getting bigger and bigger.
  2. Among them, test is writing new data synchronously, with a writing speed of 1,000 records/s. In addition, the test index is divided into tables by day, with 45GB of data per day, about 86.4 million records. When querying, today and yesterday are queried at the same time, and the data is retained for 30 days.
  3. The current cluster configuration is 3 16C32G1T mechanical disk virtual machines
  4. Hardware resources are limited and cannot be upgraded or upgraded to SSD
    5.Query logic:
POST test_07_17/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "time": {
              "lte": 1721208610478
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "bIp": "10.22.102.203"
          }
        },
        {
          "bool": {
            "must": [
              {
                "match": {
                  "aIp": "102.13.203.209"
                }
              },
              {
                "range": {
                  "beginport": {
                    "lte": 203
                  }
                }
              },
              {
                "range": {
                  "endport": {
                    "gte": 203
                  }
                }
              }
            ]
          }
        }
      ]
    }
  },
  "sort": [
    {
      "time": {
        "order": "desc"
      }
    }
  ],
  "_source": ["username", "nasip", "mac"]
}

Spinning disks support limited IOPS so I am not surprised they are strugging with indexing and high concurrent querying. This is why both the guide on tuning for indexing speed as well as the guide on tuning for search speed recommend using local SSDs.

I suspect you are just constrained by lack of IOPS and would recommend you monitor IOPS, await and disk utilisation to verify this is the case. If it is I do not really think there are any workarounds or magic solutions so would recommend you upgrade to SSDs.