I am currently trying to reindex an existing index into a new index with an updated schema. The issue i'm having is that the reindexing process gives up reindexing my ~2.5m index after about ~0.5m (give or take. it's inconsistent). To be clear, I don't mean that the request times out, but even if I provide the ?wait_for_completion=false flag the reindexing task it self ends prematurely. I've already asked this question on SO but I'm going to ask it here as well as this might be the better place to ask these sorts of questions.
My elasticsearch cluster is hosted on AWS but as far as I can tell no errors are given. I've tried the different options in the reindexing endpoint but none have made a difference. I've tried reindexing into the new schema as well as reindex into the very same schema as the existing index.
Setting version_type to external will cause Elasticsearch to preserve the version from the source, create any documents that are missing, and update any documents that have an older version in the destination index than they do in the source index.
And don't forget to use the "conflicts": "proceed", which will make the conflicts don't abort the reindex.
I would also recommend you to track the reindex task by using:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.