[SOLVED] Elasticsearch python API and reindex module

axelfelix · October 15, 2016, 1:19pm

Hi all,

I have created a python script able to reindex based on list of index.

My problem is the timeout and wait_for_completion option.

When my script launch a reindexation I have to wait until is finished otherwise the reindexation is not performed entirely. For this purpose I need to implement the right global option request_timeout based on index sized and wait the end of the reindexation for each index.

If I put wait_for_completation, I raise an exception and my reindexation failed (I have an index with some kilobyte inside). If I use the timeout option in reindex module (with 5 minutes for example), it fail too (I also get a new index without all my documents inside).

So the only way for me is to use the global request_timeout parameter with value generated depending on index size. But if my index is big, it can take a while.

In my environment, 10 indices took 30 minutes. But for the next time in need to reindex almost 200 indices so it is too long.

If somebody have an idea to play this kind of script in background or something like that.

Thanks in advance,
Alex

nik9000 · October 15, 2016, 1:48pm

This is a thing we fixed in 5.0, which isn't related yet. You use
wait_for_completion=false and it gives you back a job. You can then http
GET the job with wait_for_completion=true. If that times out you can just
try again. The fix in 5.0 is that even if the job finishes when you aren't
waiting it'll still return from that GET API.

axelfelix · October 24, 2016, 12:07pm

Thanks for your reply and sorry for my late answer.

It seems that if I put "wait_for_completion=false" during reindexation I can't make another reindexation. If I try to make a new reindexation that stop the first reindexation. It is right ? Or I did something wrong ?

Thanks in advance.

nik9000 · October 25, 2016, 5:20pm

Elasticsearch doesn't mind if you have multiple reindex tasks running at one time. If they both try to write to the same place then you are going to have trouble, but that is what the _cancel API is for. Canceling looks like:

curl -XPOST _tasks/{task_id}/_cancel

axelfelix · October 26, 2016, 8:33am

Thanks but I'm not sure that we are talking about the same subject.

When I make a reindexation with python api of one index (300mb) and the next action in my script is to reindex another index, the first reindexation stop (for the first reindexation, the new index is 3mb for example contrary to 300mb). And the second reindexation task finish correctly (if I have just 2 reindexation).

If I have 50 reindexation tasks in my script, the 49 first tasks don't work correctly, but the 50th reindexation works correctly.

I'm not sure to explain clearly my issue. So sorry about that.

Thanks,
Alex

JoarSvensson · October 26, 2016, 9:44am

Have you considered using the Curator? https://github.com/elastic/curator

axelfelix · October 26, 2016, 10:00am

Hi thanks,

I'm not sure that the curator allow reindexation.
I will check the code anyway.

Alex

theuntergeek · October 26, 2016, 2:45pm

Curator is going to have reindex in 4.x (there's already a feature request for it, and I'm actively developing it), but more especially in 5.x, using the Reindex API. It will not do generic reindexing for versions of Elasticsearch without the Reindex API, which was added in Elasticsearch 2.3.

axelfelix · October 26, 2016, 3:11pm

Thanks !

nik9000 · October 31, 2016, 3:01pm

That is pretty clear. This sounds like an issue with the python API.

axelfelix · October 31, 2016, 4:13pm

Ok, thanks.

Topic		Replies	Views
Reindex Python API timeout Elasticsearch	3	600	April 4, 2019
Reindex big index Elasticsearch	2	9385	May 18, 2017
Timeout issue when using reindex API Elasticsearch	1	2865	January 1, 2020
Timeout Error using the ReIndex API Elasticsearch	2	25884	April 17, 2018
Debugging Partially Complete Reindexing Task? Elasticsearch	2	778	February 6, 2019

[SOLVED] Elasticsearch python API and reindex module

Related topics