I see that after closing indices the shards are still reported as active (also see Elasticsearch 7.x closed indices retain shards ) and that was done to implement replicated closed indices feature .
Although I technically understand what to do now (increase max_shards_per_node), I was still wondering if it was debated whether another stat could be used for the feature, like the frozen status, which also seems to be attached to the closed indices.
Since Elasticsearch 7 the max_shards_per_node setting is mandatory and I thought it was to be able to tune on the active shards. But now the frozen and closed indices report their shards as active, while the shards have a small footprint and thus do not compare with the resource usage of active shards in open indices.
So the max_shards_per_node doesn't really say something about the resource usage and has to be increased artificially to give room for the shards of closed and frozen indices.
I am glad that we created monitoring for the percentage of allowed active shards
Just wondering whether any changes for this system were on the roadmap or what the deliberation on the pros and cons was.
There were indeed discussions on whether to include closed and frozen indices in the count used by max_shards_per_node and it was a deliberate decision to include them, although there are good arguments in both directions. Closed and frozen indices are not completely free (e.g. they must be tracked by the master and recovered on failures) and the default max_shards_per_node limit is a very coarse safety feature to protect the cluster from really bad cases of oversharding rather than a recommended target.
Maintaining a large number of closed indices in your cluster is something of an antipattern. If you don't ever want to search them then it's better to offload them into a snapshot; if you do want to support occasional searches then it's better to freeze them.
For frozen indices the limit makes a bit more sense since searching a large number of small shards is relatively inefficient, so it's recommended to do some extra work before freezing to get the most out of your system. For instance, you can consolidate the data into fewer larger indices via reindex, shrink the indices to fewer larger shards, and force-merge them to a single segment.
Thanks for your reply! Ok, at the moment we are freezing them after a week and closing them a while later (time depending on whether it is our dev or prod environment). Maybe we shouldn't use the closed before deleting indices, as it adds little to the performance. And maybe do some extra work before freezing. For now it works well enough but data is growing, so it is best to keep that in mind.
Apparently, I thought the max_shards_per_node as a less rough feature than it was intended.
Thanks again for the clear explanation!
All that would be subject to some internal discussion too, we have to be careful that the reference docs don't miss any subtleties, but a PR to do some or all of those changes would be welcome.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.