We're running an ES cluster which keeps daily indexes per customer for a 1 year period. Seeing that we have around 100 customers and replica shards enabled this puts us in the ballpark of 50k shards.
The behavior we're seeing is that while query and indexing performance is ok, operations which require the master nodes like create/delete index are getting gradually worse. Currently deleting an index can take up to a minute!
I assume the issue has to do with the cluster state objects kept on the master's, however we don't see any indications of stress on the masters, their jvm heap usage peaks at 75% cyclically and cpu stays around 50%.
How is one supposed to scale the masters in this situation? If we double the allocated Xmx could that help? Couldn't seem to find the right config params to change asides for this.
Our shards are not absolutely huge, the reason why we don't unify them into say monthly indexes instead of daily is that ES doesn't perform well on document deletes (which trigger merges). Our daily index setup allows us to always create fresh new indexes and simply delete the old ones without updating/deleting records. Until we ran into this scale issue with the masters this strategy reduced load considerably on our cluster.
Any advice or ideas is welcome! Here's our specs:
20 data nodes, 3 client nodes , 3 master nodes. 50k shards, 42TB of data.
Master nodes currently have 16Gb memory
BTW - Just wondering, why don't the master nodes share the cluster state instead of assigning a leader? If I have several master nodes anyway to avoid a single point of failure then surely its much more scalable if the cluster state can be distributed among them?
Also another point is why does the cluster state have to keep mappings/settings for each index in memory? In our used case all the indexes need identical mappings and we'd prefer one mapping for all of them not 50k. Is there any way to do this?