Disk space per node in for ES cluster is not balanced across the nodes

I have a ES cluster with a large amount of nodes. There are thousands of shards.

The indices are time based. The data is ingested during work-week, almost 24x7.

The problem I am facing is some of the nodes in cluster have high disk space. These are also the same nodes which have secondary shards and don't contain any primary shards.

I have read about cluster allocation settings which we can set to rebalance the disk usage.

What I am not clear about is why are shards being unevenly distributed by size?

Any document links would help explaining similar problem details and solutions.

You should look to decrease the shard count, it's pretty high and would be adding a lot of unnecessary heap pressure.

1 Like

While this is difficult to troubleshoot without more allocation, the first things that occur to me are:

  1. Are the overloaded nodes running anything besides Elasticsearch? For example, some users run other software alongside Elasticsearch on the same servers, such as Kibana. This could cause high resource usage on those nodes.
  2. Do all the nodes have the same hardware configuration?

These may seem obvious, but just in case.

As mentioned, that is a very high number of shards - especially if the indexes are configured with replicas. This is likely causing high memory usage, which may cause high CPU usage because garbage collection needs to happen more often.

It looks like you have about 3GB/index from the numbers you give. I recommend using weekly indices, instead of daily - this should give you an index size of ~21GB. Further, after you rollover each index and are finished writing data to it, you should likely shrink each index to one shard.

To answer the other part of your question, Elasticsearch will already take into account disk space (although not CPU or memory usage, as far as I am aware) when allocating shards. You can tune the parameters it uses via the cluster.routing.allocation.disk.* settings.

1 Like
  1. Are the overloaded nodes running anything besides Elasticsearch? For example, some users run other software alongside Elasticsearch on the same servers, such as Kibana. This could cause high resource usage on those nodes.

Yes, Kibana is running on these servers.

  1. Do all the nodes have the same hardware configuration? They all have same configuration (same type AWS instance).

Thanks for the suggestions, I will convert the indices to weekly-based rather than daily and also shrink the indices once they are not being written to.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.