Reindex API does not complete the re-indexing

abhadauria · September 11, 2024, 8:09am

using ES version 7.17, after triggering the re-index API from query node, it stops re-indexing after some time and does not complete the re-indexing of all the docs in the source index, eg, we have ~110 million docs in the source index, but it stops re-indexing randomly after 20 million, if we trigger again after ~35 million and so on.

we are using automatic slicing in the re-index API, below is the sample request

curl -u user:pass -X POST "https://localhost:9200/_reindex?slices=100" -H "Content-Type: application/json" -d "{"source": {"index": "sourceIndex"},"dest": {"index": "targetIndex"}}" -k

is there some kind of timeout or anything else I am missing here, as we are not seeing any error on the console.

carly.richmond · September 11, 2024, 4:20pm

Hi @abhadauria,

Looking at the reindex API query parameters there is indeed a timeout operation, which default to 1 minute. So you could tweak that parameter.

Have you tried either tweaking the number of slices as per the documentation guidance or perhaps tried the automatic slicing option?

Let us know!

abhadauria · September 13, 2024, 5:45am

thanks for responding @carly.richmond, we have set number of slices equal to the number of shards in the index, i.e. 100

I will try setting up the timeout and see if it helps.

abhadauria · September 13, 2024, 10:34am

we are getting this error after increasing the timeout to 2h:

{"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=3319047037, replica_bytes=0, all_bytes=3319047037, coordinating_operation_bytes=76231083, max_coordinating_and_primary_bytes=3328599654]"}

we have allocated 30GB heap to the JVM where Elasticsearch is running.

size of the index we are trying to re-index is ~7 tb

carly.richmond · September 13, 2024, 11:19am

Thanks for confirming. That's a pretty chunky index to reindex. Do your documents have any particular attribute that you could use to split them? That way you could try specifying a query as part of the reindexing to split it using the query option.

There is an option to run the reindex asyncronously as well that could help, but in your case I'm inclined to suggest trying that over smaller requests.

Hope that helps!

Topic		Replies	Views
Timeout issue when using reindex API Elasticsearch	1	2865	January 1, 2020
Elasticsearch reindex API is not able to copy the documents Elasticsearch	8	1518	March 4, 2021
Elasticsearch Reindex API - reindex only missing docs Elasticsearch	2	2312	May 19, 2017
Improving performance of reindex API? Elasticsearch	7	12146	July 5, 2017
_reindex API issue Elasticsearch	4	759	July 5, 2017

Reindex API does not complete the re-indexing

Related topics