we are about to learn (the hard way) that the index-per-user model is the wrong way to go.
Initially we planned for one index per user but with the announcement that types are going to be removed in ES6 we now ended up with 7 indices per user.
Overall we now have a 4-node cluster with 30k indices with overall 66k shards with additional indices being created all the time.
The only reason that this cluster is still alive is that we got some good SSD Raid0 setups in the node.
We now know that this is absolutely wrong and we are about to fix this by re-indexing the data to weekly indices.
But the cluster is about to reach the critical limit for cluster state refreshs to be acknowledged in time by all nodes.
Even though there are only diffs of the cluster state being communicated since https://github.com/elastic/elasticsearch/issues/6295, state updates always seem to include the full routing table (which is massive with 66k shards).
We are looking for a short term solution to get rid of some of this load without actually losing the data while we re-index the data into the new schema.
We only need to survive a few more days to prepare for the data model change.
Can anyone help us on how we could reduce the cluster load with regards to state refreshs and routing table comms?
Would it for example help if we we close a portion of the indices? Does that exclude them from the routing table?
Or any parameter that we could tweak?
Thanks and regards