Is there any limitations or recommendation of how large the cluster state can/should be in elasticsearch 7.x? Eg, how many nodes is recommended? How many shards and indices?
The context for this question is that we're in the middle of migrating a cluster from version 1.7 to 7.x. In the current 1.7 cluster we've had issues with slow cluster updates and that the masters struggle to keep up with pushing out the cluster state.
Our current cluster has about 1000 nodes with 90k-ish shards over 16k indices, containing some 2PB data.
Our experience with the current cluster is that we've really been pushing the limits but we also know that newer versions of ES mostly only push deltas. So is a large cluster state a problem still?
In more recent versions, as you now have the cross cluster search feature, it might be easier to split your cluster in multiple ones.
A lot of enhancements happened for the cluster state management since 1.7 though so upgrading might help in the first place to deal with those cluster state updates.
What would be the benefit of having separate clusters as compared to one monolith?
Can the cluster state still be an issue in recent versions or are there mainly other benefits to it?
I believe cluster state handling is a lot better in newer versions but am not sure if editing it is still single threaded. If it is you are still going to have limitations and are probably better off with a number of smaller clusters, e.g. a few hundred nodes each or so.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.