Shrink and split APIs

Hi,

I tried the shrink API on a 30GB index and shrinked from 15 shards to 5 shards.
What I am not sure about is that this seemed to have happened instantaneously.
I checked the _cat/recovery endpoint and saw that there are ongoing tasks there, but when I was using _count or _search it looked like the count was correct and that the search found what I was looking for.

In the shrink doc it is stated: "Hard-links segments from the source index into the target index". So I thought that maybe both indices point to the same segments behind the scenes and that is maybe why it appeared very quick. Yet when I made an update to a document in the target index it was not reflected to the source index (which is what I hoped for).

Can someone explain what actually happens behind the scenes? I read here that "The shrink index API combines the existing segments without reprocessing". So there is some kind of copy-merge of the existing segments onto the new index (which was apparantely not complete as i saw in the _cat/recovery). Just wondering how come the _count, _search, _update were already working for the new index while the process was still ongoing?

And one more question: here it is mentioned that "Once it's all done, you probably want to remove the original index". What happens if you delete the older index while _cat/recovery still shows progress? Is Elasticsearch smart enough NOT to delete the source index segments while shrinking is still ongoing?

Thanks.

Yes it typically uses hard-linking which is very fast. This works because the files on disk are immutable so any changes in one index won't be reflected in the other.

Similarly it's fine to delete the source index as soon as the target index is reported as healthy. With hard-linking, deleting one link to the underlying data won't affect any other links.

Similarly it's fine to delete the source index as soon as the target index is reported as healthy

Do you mean when the new index becomes yellow (due to replicas) or green? or is there some other way to know for sure that the older index can be deleted without risk or losing data?

I'd recommend waiting for green health.

Thanks very much.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.