Hi all,
I have a small cluster with 37 nodes (3 master and 34 data), the cluster had an unusual number of shards per node (~1200) which was wrong according to elasticsearch documentation.
Following the good standards recommendations, I have changed the number of shards for each index in order to fit no more than 600 shards per node.
This last setting is applied to new indices (the cluster creates new ones everyday and delete the ones with more than 30 days) so its still in progress and now I have ~700 shards per node.
The problem is that one of the nodes is taking much more data in terms of space (same number of shards as the other nodes), yesterday I took this node out of the cluster restarted it and joined it again so that the shards we balance again. This solution worked for 24h but today I noticed that I only have 300GB of disk free when on the other nodes I have around 600GB.
The outcome is that when this node reaches the watermark, the cluster starts to suffer and the queues increase causing the ingestion to fail.
Is there any way I can know why such big difference on data used ? or why it always happen on this node ? Any other advise ?
Thanks in advance