Reindex API does not complete the re-indexing

using ES version 7.17, after triggering the re-index API from query node, it stops re-indexing after some time and does not complete the re-indexing of all the docs in the source index, eg, we have ~110 million docs in the source index, but it stops re-indexing randomly after 20 million, if we trigger again after ~35 million and so on.

we are using automatic slicing in the re-index API, below is the sample request

curl -u user:pass -X POST "https://localhost:9200/_reindex?slices=100" -H "Content-Type: application/json" -d "{"source": {"index": "sourceIndex"},"dest": {"index": "targetIndex"}}" -k

is there some kind of timeout or anything else I am missing here, as we are not seeing any error on the console.

Hi @abhadauria,

Looking at the reindex API query parameters there is indeed a timeout operation, which default to 1 minute. So you could tweak that parameter.

Have you tried either tweaking the number of slices as per the documentation guidance or perhaps tried the automatic slicing option?

Let us know!

1 Like

thanks for responding @carly.richmond, we have set number of slices equal to the number of shards in the index, i.e. 100

I will try setting up the timeout and see if it helps.

we are getting this error after increasing the timeout to 2h:

{"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=3319047037, replica_bytes=0, all_bytes=3319047037, coordinating_operation_bytes=76231083, max_coordinating_and_primary_bytes=3328599654]"}

we have allocated 30GB heap to the JVM where Elasticsearch is running.

size of the index we are trying to re-index is ~7 tb

Thanks for confirming. That's a pretty chunky index to reindex. Do your documents have any particular attribute that you could use to split them? That way you could try specifying a query as part of the reindexing to split it using the query option.

There is an option to run the reindex asyncronously as well that could help, but in your case I'm inclined to suggest trying that over smaller requests.

Hope that helps!