Reindex vs Split Speed and Storage Requirements

I have seen a previous topic discussing the difference in speed between reindexing and splitting for an index Reindex vs Split index speeds

  • if splitting is much faster than reindexing since it is hard-linking the underlying files of the index, why does splitting need 3 - 4 times of storage when it is using the hard links ?

I appreciate your time and support.

The hard-linking phase makes starting up the split shards fast, but the resulting shards contain all the docs from all the other shards too, in a deleted state, but those deleted docs make them a little inefficient at first. Once the shards have been started, a background merge addresses this by rewriting the shard contents in a more efficient form, removing all the deleted docs. While the merge is ongoing it must hold on disk both the old unmerged (hard-linked) data and the new (rewritten) data, plus various temporary files, and in extreme cases this can temporarily need some multiple of the original shard size. Often it doesn't.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.