I have a few questions on the internal working of the split api:
-
In the documentation it is mentioned that
Hashes all documents again, after low level files are created, to delete documents that belong to a different shard.
What is actually happening here ? When monitoring the split processes I see that the number of segments increase and then the segment merging happens to delete the extra documents in the shards but the_id
field of the documents remains the same. So what is the meaning of "Hashes all documents again" here ? -
There is a section on "Why Incremental resharding is not supported?" but it doesn't explain what advantages do we get by only allowing splits in multiples of number of shards. How is adding a single shard different from adding shards in multiple number of shards in source index.
-
The segment merging processes is faster when the number of shards are higher (say 16) but it takes more time when number of shards are lower (say 2 or 3). Also when the number of shards are low (say 2 or 4) then even after the merging is complete there are still some documents that are marked for deletion, but in higher number of shards the number of documents to be deleted always comes down to 0 in the
_cat/indices
API.