Monthly query performance on too many shards


I have an Elasticsearch cluster on 5 nodes.
I use Elasticsearch for log management and 3 indexes are created every hour (1 shard, 0 replica)
Every log line have 20-70 fields, like IP, port, hostname, user agent etc.
Some indexes are 30GB in size, some indexes are 2-3GB. Size depends on time of day.

When i make a query for top source IP addresses in last 3 months,
all elastic nodes io utililization (iostat -x 3) are %100 and my query getting time out response.

How can i improve the performance of the query?


Elasticsearch version: 7.16.2
OS version: centos 8

Hardware specs for each node:

8 CPU:
model name      : Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
cpu MHz         : 2095.077
cache size      : 36608 KB

Disk speed (SSD):

# hdparm -tT /dev/sdc
 Timing cached reads:   13696 MB in  2.00 seconds = 6855.42 MB/sec
 Timing buffered disk reads: 2526 MB in  3.00 seconds = 841.65 MB/sec

Index template:

  "index_patterns": [
  "settings": {
    "index": {
      "max_result_window": "1000000",
      "routing": {
        "allocation": {
          "total_shards_per_node": "1"
      "refresh_interval": "120s",
      "number_of_shards": "1",
      "translog": {
        "flush_threshold_size": "64m",
        "durability": "async"
      "number_of_replicas": "0"
  "aliases": {},
  "mappings": {
      "numeric_detection": false,
      "date_detection": false,
      "properties": {
        "date": {
          "type": "date"

query example:
{"size":0, "query": {"bool":{"must":[{"query_string":{"query":"+date:[1635724800000 TO 1644278399000] +action.keyword:drop"}}]}}, "aggs":{"aggrsNameName": {"terms": {"field":"source_ip.keyword", "size": 10000}}} }

Welcome to our community! :smiley:

What is the output from the _cluster/stats?pretty&human API?
Can you check hot threads while this is running?
What do your Elasticsearch logs show?

If your average data is 30Gb/2 per hour=15GB x 24 hours x 90 days = 32400Gb of data.
A search of that magnitude will kill any system. You run out of ram to cache anything in any index, which will cause it to have to do physical reads of data from your drive. That creates a massive slowness. Since each node generally recommended to only have 32G for ES. You have only 8 G (1000/5)=200G per node for ram, that means 20+ times the data vs the ram availability.

Lets assume you want the top 100 ip addresses
Instead of doing a 3 month query. Can you do a every 6 hour or daily query and insert into a new index. Say top 200 ip addresses. Then consolidate the report from this new index?

This should reduce your burst IO requirements. It will reduce searching billions of records down to maybe 1-2m records for 3 months to generate your report.

1 Like

Except that's not how Elasticsearch works. It doesn't need to load everything into memory to be able to search or aggregate on it.

I didn't say that it cannot do aggregations or search. All I said was without enough cache, it will have to do physical reads to get the data to do search and aggregations. It will still work, just much slower. Remember he is talking about timeout in his search, not talking if it can or cannot do it. I'm pretty sure a smaller time frame it will work nicely.

You obviously don't know how cache works. If any type of data is not used frequently in a cache, it will be removed. Now, with that much data, there is no way, it can even store all the index (not including the data) in memory. Which means ES has to drop caching older indexes not in use to make space for current indexes. When he queries older indexes, ES has to physically read the older index, rebuild the cache with older index to start generating a report.. and keep going, dropping some cache for the next data it needs.

With the massive amount of data, the cache will always be swapped out with new reads during report generation, that also impacts insertion of data. The data he is reading is 3 months, which means none of the data is remotely going to be in the current memory, so ES has to physically read all the index just to do the search. That takes time, no matter how fast your drives are. He has to search 32Tb of data, that is not fast. Unless you want to tell me that is not going take any time?

I'm proposing a more constant generation of interim reports at a smaller time frame. This may allow some or all of the data to be in cache so his interim reports are fast without high IO. Then his final 3 month reports will have a much smaller index to search and aggregate. 8G of ram cannot be totally allocated to ES.. so me probably has only 4-6G of ram allocated. That very small relative to the data size.

Had a ES consultant recommend 64G systems with 5 nodes, for data 1/8 of what data the original poster said.
The customer I had went with 3 nodes.. now when fully loaded, its slower for long time queries as expected.. so now he's getting 2 more nodes.

1 Like

Hi warkolm, thanks.

We have increase the ram on all nodes:

Master node: 96GB RAM
Other 4 nodes: 64GB RAM

And we still have same problem.
Cluster stats and hot threads output is here:

Based on the stats it looks like you have just shy of 3TB per node and that your average shard size is just below 3GB, which is a bit small. This seems to correspond to roughly 70 days worth of data. Does this sound correct.

As you only have primary shards in the cluster all shards matching the time range need to be accessed when you run a query. As the data volume is much higher than what can fit in the page cache, this will result in a lot of disk I/O. As you at the same time are indexing data it sounds like the cluster does not have enough resources to handle the full load in a timely fashion. If you are able to temporarily pause indexing it would be interesting to see how the query/dashboard performs when there is no indexing going on and all resources can be allocated to querying.

As storage I/O seems to be the bottleneck it would be interesting to know what type of storage you are using. Is it locally attached SSDs?

Consolidating the report is good idea, but it will have to do a lot of queries.
Is there any automated way to do this.
Or there may be a configuration for limit me during the query,
because when i/o is %100, write operations are also affected.

Two ways to try.

  1. you would use a watcher script and insert into another index
    So you would run lets say the watcher once a day(say after midnight) with a time index of start-end of previous day, get the top say 200 IPs (of course you can run it more frequent with shorter start-end time)
    Then insert it into a new index.
    Watcher index action | Elasticsearch Guide [7.17] | Elastic

  2. The other way is to run transforms.. where the transform consolidate your data for you into another table. Same situation, transforms can run on a scheduled time too.
    Transform examples | Elasticsearch Guide [7.17] | Elastic

I've not done either of these methods extensively Tried transform once.. but my customer data didn't lend itself to a clean transform. Even the ES consultant couldn't make it work for my customer data. So we went for a watcher script to generate the report and alerting.

Watcher script for targeting another index is something like this (only listing the action phase)

  "actions": {
    "index_payload": {
      "transform": {
        "script": {
          "source": "<some script to consolidate data or add some conversions like string to numbers from your aggregations you might skip this part if you don't need it> ",
          "lang": "painless"
      "index": {
        "index": "IP_index"   <= this is your new target index.

So for the watch filter, it could be like this

                      "filter": [
                           "range": {
                             "logdate": {
                               "to": "{{ctx.metadata.time_end}}",
                               "from": "{{ctx.metadata.time_start}}"

For the timing you can use things like (assuming you are running at midnight, if not running at midnight you have to offset differently)

  "metadata": {
    "time_start": "now-1d",
    "time_end": "now"

This will translate the variables time_start and time_end to (current time-1 day) and (current time).
If you want different offsets you can use this if you run the watcher at 2am.

  "metadata": {
    "time_start": "now-26h",
    "time_end": "now-2h" 

Remember to create your ILM policy for your new index as well.
Then your reports are created query the IP_index instead of your primary source index.

There could be some more options.

  1. Roll up old indices for certain query.
  2. Async serach.

Yes, there are ~70 days of data. But heavy data in the last 50 days.

I'll try this on first available time, and share the results.

These are virtual machines and SSD disks are locally attached on vm host.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.