Python API reindex issues

smhall316 · August 22, 2019, 3:05pm

I am relatively new to Elasticsearch so I have been playing with a bit using the Python API. I am using Elasticsearch service through AWS so I have been following its suggestions for restoring snapshots.

Since AWS does not allow me to close indices I create a temporary index and then use reindex to copy the documents from the temporary index to the default index. Both indices (the default and temp) have different number of documents. I expected that after calling reindex that the temp and default would have the same number of documents but it doesn't happen.

So for example I have two indices

temp_index with 1000 documents
index with 900 documents

I expected after I reindexed temp_index to index that index would now have 1000 documents but it still shows 900. Is there something I am missing with how reindex works?

HenningAndersen · August 22, 2019, 6:39pm

Hi @smhall316,

I wonder if you can supply information on the reindex request and response? Also, the version of Elasticsearch is always good information.

Are the numbers given the real numbers for doc counts in the two indexes or are they example numbers?

smhall316 · August 23, 2019, 1:31pm

Hi @HenningAndersen,

Thanks for the quick reply. I think I figured out what was happening or at least what I think is happening.

I was calling restore and reindex within the same function and I was getting some HTTP 503 errors. Sometimes there were some partial reindexing of the index and other times there was none. The python API seems to have a predetermined number of retries when the service is not available.

I found that if I wait a few seconds between restore and reindex everything works. So I am guessing that my Elasticsearch instance wasn't fully done with the restore when I called the reindex. Does this make sense, is my understanding correct?

BTW, I am using Elasticsearch 5.5

HenningAndersen · August 23, 2019, 5:17pm

Hi @smhall316,

yes, that sounds like a plausible explanation. Restore by default runs async. You can use wait_for_completion=true to make the request wait until the restore operation has completed.

system · September 20, 2019, 5:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Missing documents after _reindex of daily indices Elasticsearch	4	2252	April 19, 2018
_reindex API issue Elasticsearch	4	759	July 5, 2017
Reindex not copying every document that exists on the source index Elasticsearch	1	467	July 28, 2020
Elasti search reindexing indexes only few 1000 or more documents but not to the original count of orginal index Elasticsearch	2	301	January 17, 2022
[SOLVED] Elasticsearch python API and reindex module Elasticsearch	11	3201	July 5, 2017

Python API reindex issues

Related topics