One node taking much more space than others

Joao_Pereira · March 22, 2019, 2:08pm

Hi all,

I have a small cluster with 37 nodes (3 master and 34 data), the cluster had an unusual number of shards per node (~1200) which was wrong according to elasticsearch documentation.
Following the good standards recommendations, I have changed the number of shards for each index in order to fit no more than 600 shards per node.
This last setting is applied to new indices (the cluster creates new ones everyday and delete the ones with more than 30 days) so its still in progress and now I have ~700 shards per node.
The problem is that one of the nodes is taking much more data in terms of space (same number of shards as the other nodes), yesterday I took this node out of the cluster restarted it and joined it again so that the shards we balance again. This solution worked for 24h but today I noticed that I only have 300GB of disk free when on the other nodes I have around 600GB.
The outcome is that when this node reaches the watermark, the cluster starts to suffer and the queues increase causing the ingestion to fail.
Is there any way I can know why such big difference on data used ? or why it always happen on this node ? Any other advise ?

Thanks in advance

DavidTurner · March 23, 2019, 11:26am

I do not think I'd call 34 data nodes * 700 shards per node a "small cluster"

The first thing I would look for is whether there is a single oversized shard on that node, or whether the shards on that node are just generally larger than elsewhere. The GET _cat/shards API is a good place to look for this:

curl -s http://localhost:9200/_cat/shards?bytes=b | sort -k5 -n | sort -k8 -s

(obviously change the URL to point to one of your actual nodes)

Because you have so many shards it'll not be possible to share the full output here, but you will be able to put it on https://gist.github.com.

Which version are you using?

system · April 20, 2019, 11:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
One node in cluster is using (a lot) more heap space and cpu Elasticsearch	4	2460	July 5, 2017
One node have less shards then the others Elasticsearch	3	399	June 7, 2020
Three Node Elastic Cluster balance issue Elasticsearch	7	219	December 8, 2022
Disk space is increasing Elasticsearch elastic-stack-monitoring	11	3100	September 19, 2019
Shard reallocation and disk space Elasticsearch	5	804	August 4, 2020

One node taking much more space than others

Related topics