Disk space per node in for ES cluster is not balanced across the nodes

Akshay_Deshpande · November 5, 2018, 9:27pm

I have a ES cluster with a large amount of nodes. There are thousands of shards.

The indices are time based. The data is ingested during work-week, almost 24x7.

The problem I am facing is some of the nodes in cluster have high disk space. These are also the same nodes which have secondary shards and don't contain any primary shards.

I have read about cluster allocation settings which we can set to rebalance the disk usage.

What I am not clear about is why are shards being unevenly distributed by size?

Any document links would help explaining similar problem details and solutions.

warkolm · November 5, 2018, 9:49pm

You should look to decrease the shard count, it's pretty high and would be adding a lot of unnecessary heap pressure.

gbrown · November 5, 2018, 10:26pm

While this is difficult to troubleshoot without more allocation, the first things that occur to me are:

Are the overloaded nodes running anything besides Elasticsearch? For example, some users run other software alongside Elasticsearch on the same servers, such as Kibana. This could cause high resource usage on those nodes.
Do all the nodes have the same hardware configuration?

These may seem obvious, but just in case.

As mentioned, that is a very high number of shards - especially if the indexes are configured with replicas. This is likely causing high memory usage, which may cause high CPU usage because garbage collection needs to happen more often.

It looks like you have about 3GB/index from the numbers you give. I recommend using weekly indices, instead of daily - this should give you an index size of ~21GB. Further, after you rollover each index and are finished writing data to it, you should likely shrink each index to one shard.

To answer the other part of your question, Elasticsearch will already take into account disk space (although not CPU or memory usage, as far as I am aware) when allocating shards. You can tune the parameters it uses via the cluster.routing.allocation.disk.* settings.

Akshay_Deshpande · November 5, 2018, 10:35pm

Are the overloaded nodes running anything besides Elasticsearch? For example, some users run other software alongside Elasticsearch on the same servers, such as Kibana. This could cause high resource usage on those nodes.

Yes, Kibana is running on these servers.

Do all the nodes have the same hardware configuration? They all have same configuration (same type AWS instance).

Thanks for the suggestions, I will convert the indices to weekly-based rather than daily and also shrink the indices once they are not being written to.

system · December 3, 2018, 10:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Disk usage difference between data nodes Elasticsearch	6	1808	August 3, 2021
One node in cluster is using (a lot) more heap space and cpu Elasticsearch	4	2433	July 5, 2017
Unbalanced disk usage with ES 6.1.3 Elasticsearch	4	2554	May 1, 2018
Shards balance and homogeneity of their sizes Elasticsearch	8	2076	September 17, 2021
Balancing disk usage on large clusters? Elasticsearch	3	1503	September 24, 2020

Disk space per node in for ES cluster is not balanced across the nodes

Related topics