In the beginning, we had 4 data nodes. Two were ilm designated hot nodes and the other two were warm nodes. Newly created indices would be in the hot nodes under the hot phase and after 4 weeks of retention, would move to the warm nodes under the warm phase.
Recently we added two more warm nodes to the cluster. But what we have been noticing is that the indices are prominently populating these new warm nodes instead of getting evenly distributed across all the 4 warm nodes. How can I make the indices to stop populating only the new warm nodes?
We don't have a set ILM policy as of now, because the naming for the majority of our indices are static and not dynamic. So we manually run the command
PUT *2021.<week_number>/_settings
{
"index.routing.allocation.require.data": "warm"
}
every week to rollover indices from hot to warm phase.
It looks like you have a license above Basic, based on your use of Watcher and AD in Security, so I would encourage you to reach out your Support contact about this as well.
So all of those nodes have the same shard count on them. It does look like the shard sizes are different though, which would account for what you are seeing.
However ES balances by shard count first, then if it starts to hit disk watermarks then it will move things around as needed.
But despite this, the free disk spaces on two of the warm nodes out of 4 have reduced to as low as 14gb and 65gb. Why is this happening? Why didn't elasticsearch stop pushing data to these two nodes once they reached the 100gb threshold?
The GET /_cat/allocation output that I had posted is a couple of weeks old. Today it had reached 1gb and 65gb. We have done some cleanup work now and the free disk space has risen to around 100gb, so no use providing the current allocation output.
Would be good to see diagnostic info from when the problem was occurring, but if the problem is not occurring any more then there's not a lot we can do.
If it happens again then obtain GET _cat/allocation and also run the cluster allocation explain API on one or more shards on the node that's too full. Logs from the master at the same time might be useful too.
The problem still exists David. The free disk space has increased because we have manually deleted indices. Otherwise, it would have reached 0gb by now.
Sure, when the free disp space of the nodes fall below 100gb, I will post the allocation output and provide the logs from Master
Is there a way to temporarily halt elasticsearch from allocating shards to a particular node that is space constrained so that it can concentrate on other nodes?
Yes, that happens by default when the free space reaches the high watermark. I think it's already doing its thing, given that this node has fewer shards than the others. Would you supply the other diagnostic info I asked for above (allocation explain for a shard on the overfull node, plus master logs)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.