I had recently some discussions about auto resharding capabilities in Elasticsearch: basically, at some point in time, without any human intervention, the system would "detect" that for a given index resharding is needed in order to improve the performance.
I know that currently this mechanism is not supported and that elasticsearch provides some APIs to shrink or split the existing index. Even if ES does not support incremental resharding, the split capability could be used as a basis to build an auto resharding capability.
I have actually 2 questions:
do you think this resharding capability could be safely implemented? Resharding, usually requires to reindex the data. So it's a pretty expensive action that has a performance impact on the whole cluster. I suppose the split operation is much less expensive. Additionally, I suppose it can be hard to detect when a resharding operation would have a significant performance impact. So is it an operation that would be a good candidate for automation?
Since it's not provided by any Elasticsearch SaaS offering provider, are you aware of some teams who have successfully implemented it?
The fundamental mechanism (either split/shrink or reindex) is already safely implemented but the vital missing detail is how one would automatically decide to do this process. Can you explain how you would make these decisions even manually? What do you mean by "improve performance"?
As a rule I'd expect fewer larger shards to handle searches more efficiently, but they take longer to recover or relocate and are harder to balance evenly across the cluster. Can you describe the circumstances under which you want to split an index to improve performance?
For time-based data (logs, metrics etc) the usual approach is already automated using ILM: use size-based rollover to target a consistent index size and use shrink and force-merge to optimise older indices. This is not available on every SaaS provider (notably, AWS's managed service doesn't support ILM) but it is available on Elastic Cloud.
For non-time-based data it seems much trickier because the benefits of this kind of resharding operation are very context-dependent.
The split/shrink capability requires the index to be read only.
What we have implemented is that whenever we want to split/shrink an index. We create one more index and trigger a reindex. During the time of reindexing every write request goes to both new and old index, however all read/search requests are served from the old index. Whenever re-indexing is finished we switch the alias to point to the new index and we delete the old index.
This is exactly what I'm trying to figure out. I think the reasons why you choose a specific sharding strategy at the beginning (let say #shards = #nodes per index) is inevitably going to change as you scale out your cluster (new nodes, new indexes) and the index size increases. So in order to take more advantage of these newly available resources, being able to split&rebalance smaller shards sounds compelling, am I right?
So these are the "circumstances": scaling out, new indexes, actual prod usage, increasing CPU, increasing search latency... that leads and allows us to make better/more adapted sharding choices.
Now definitely there are counterexamples, like if you have mainly big "aggregation" oriented queries, having more smaller shards will have I think a more negative impact. So having smaller shards can impact positively the performances but not in all the cases.
Now as a side effect, performances are also impacted when your big shards have to be reallocated.
It looks difficult to automatically detect which index should be resharded in order to improve this index performances but also the overall cluster performances. And my comment about "safety" was actually because a resharding operation is something that can impact negatively the cluster's performances and would have to be monitored closely.
So I was conflicted between the pain of having to manually reshard (incrementally) a growing cluster with hundreds of indexes and the risk/difficulty of trying to do it automatically and figure out when to do it.
This is why I'm trying to find out how people who do not have time-based data but considerable # of indexes, are addressing this concern to save ops resources.
Yes we do it in advance. Typically we trigger it when a shard size of an index grows beyond a particular limit.
There is minor performance impact => Since we are writing to two indexes instead of one, the write latencies are slightly increased. There are other implications as well => What if the write to oldIndex succeeds but the write to the new index failed. We have built app level logic to handle all these cases.
The time of the operation is usually quite fast, but takes long if the index has a lot of documents.