I am relatively new to Elasticsearch so I have been playing with a bit using the Python API. I am using Elasticsearch service through AWS so I have been following its suggestions for restoring snapshots.
Since AWS does not allow me to close indices I create a temporary index and then use reindex to copy the documents from the temporary index to the default index. Both indices (the default and temp) have different number of documents. I expected that after calling reindex that the temp and default would have the same number of documents but it doesn't happen.
So for example I have two indices
temp_index with 1000 documents
index with 900 documents
I expected after I reindexed temp_index to index that index would now have 1000 documents but it still shows 900. Is there something I am missing with how reindex works?
Thanks for the quick reply. I think I figured out what was happening or at least what I think is happening.
I was calling restore and reindex within the same function and I was getting some HTTP 503 errors. Sometimes there were some partial reindexing of the index and other times there was none. The python API seems to have a predetermined number of retries when the service is not available.
I found that if I wait a few seconds between restore and reindex everything works. So I am guessing that my Elasticsearch instance wasn't fully done with the restore when I called the reindex. Does this make sense, is my understanding correct?
yes, that sounds like a plausible explanation. Restore by default runs async. You can use wait_for_completion=true to make the request wait until the restore operation has completed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.