Hi, I'm trying to automatize reindex operation. I'm looking for a robust way to determine that my reindex operation was indeed successful before start using new indexes. I always start reindex as an asynchronous operation and query for task status. Currently I need to support both Elasticsearch
In documentation of
v6.8 Task API I've seen [LINK]:
This object contains the actual status. It is identical to the response JSON except for the important addition of the
totalis the total number of operations that the
_reindexexpects to perform. You can estimate the progress by adding the
deletedfields. The request will finish when their sum is equal to the
However both versions describe
total in reindex response as [LINK]
(integer) The number of documents that were successfully processed.
When playing around with cluster in
v7.3 I see that it more behaves as described in docs for
v6.8. When I force my reindex operation to fail (even synchronously) on some documents total still counts them. I see that some were created/updated but
total seems like sum of all operations.
v7.3 task status for reindex operation returns also
Array of failures if there were any unrecoverable errors during the process. If this is non-empty then the request aborted because of those failures. Reindex is implemented using batches and any failure causes the entire process to abort but all failures in the current batch are collected into the array. You can use the conflicts option to prevent reindex from aborting on version conflicts.
So in my synthetic situation I've got task that contains
6 documents were created and there are
7 failures for other documents that were not valid due due to mapping definition.
Is there a risk of
v7.3when lots of batches fail for many documents (like re-indexing very large index) causing this failures array to growth to enormous size?
I could not find in docs if there will be always failures entries in case some documents could not be indexed?
Is is safe to assume that reindex was not fully successful when:
`total` - (`created` + `updated` + `noop?` + `deleted`) > 0or is there better way to assert that in