Shrink and split APIs

ewolfman · July 3, 2022, 4:31pm

Hi,

I tried the shrink API on a 30GB index and shrinked from 15 shards to 5 shards.
What I am not sure about is that this seemed to have happened instantaneously.
I checked the _cat/recovery endpoint and saw that there are ongoing tasks there, but when I was using _count or _search it looked like the count was correct and that the search found what I was looking for.

In the shrink doc it is stated: "Hard-links segments from the source index into the target index". So I thought that maybe both indices point to the same segments behind the scenes and that is maybe why it appeared very quick. Yet when I made an update to a document in the target index it was not reflected to the source index (which is what I hoped for).

Can someone explain what actually happens behind the scenes? I read here that "The shrink index API combines the existing segments without reprocessing". So there is some kind of copy-merge of the existing segments onto the new index (which was apparantely not complete as i saw in the _cat/recovery). Just wondering how come the _count, _search, _update were already working for the new index while the process was still ongoing?

And one more question: here it is mentioned that "Once it's all done, you probably want to remove the original index". What happens if you delete the older index while _cat/recovery still shows progress? Is Elasticsearch smart enough NOT to delete the source index segments while shrinking is still ongoing?

Thanks.

DavidTurner · July 4, 2022, 7:58am

Yes it typically uses hard-linking which is very fast. This works because the files on disk are immutable so any changes in one index won't be reflected in the other.

Similarly it's fine to delete the source index as soon as the target index is reported as healthy. With hard-linking, deleting one link to the underlying data won't affect any other links.

ewolfman · July 4, 2022, 3:32pm

Similarly it's fine to delete the source index as soon as the target index is reported as healthy

Do you mean when the new index becomes yellow (due to replicas) or green? or is there some other way to know for sure that the older index can be deleted without risk or losing data?

DavidTurner · July 4, 2022, 4:11pm

I'd recommend waiting for green health.

ewolfman · July 4, 2022, 9:42pm

Thanks very much.

system · August 1, 2022, 9:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
_split index API issues Elasticsearch	2	299	June 13, 2022
Can I delete the source index that's been split into a new one? Elasticsearch	3	650	March 2, 2021
Lots of deleted documents above 40% Elasticsearch	41	5906	September 26, 2017
How does segment merging work Elasticsearch	6	943	July 5, 2017
Index size explodes after _split Elasticsearch	4	916	November 14, 2018

Shrink and split APIs

Related topics