Request volume management

lzambarda · March 17, 2020, 12:26pm

Hello, I am running an ES 7.6.0 cluster and once in a while I need to perform a full index rebuild.

Currently I need to index 15 million documents and my indexer currently is able to be fine tuned on request size, maximum concurrent calls, delay etc. All index calls happens via bulk insert.

What I am experiencing is no errors on any bulk response but still in the end I have missing documents.

The only way I found to index everything safely was to dramatically reduce both payload size and number of concurrent requests.

I don't think it is feasible for me to use a sort of binary search approach to find the best settings for such task, because no matter what I do the optimal configuration is going to change with an increasing amount of documents.

What is the best way to manage the request volume performed? Is there any way my indexer can be aware of the cluster's condition while operating?
Thanks in advance!

DavidTurner · March 17, 2020, 12:42pm

Are you using an official client or are you doing this "by hand" at the HTTP level?

When you say you see no errors do you mean that the top-level bulk response is 200 OK or are you checking the status of each document within the response?

lzambarda · March 17, 2020, 3:02pm

Thanks for the prompt reply.
We are using a custom client which calls the _bulk endpoint.

When you say you see no errors do you mean that the top-level bulk response is 200 OK or are you checking the status of each document within the response?

Both. For every document I also make sure that failed is never greater than 0.

Elasticsearch documentation is not exactly clear in describing what should be the behaviour of the cluster when the volume of requests is too high, therefore I am not sure on what I should be expecting client side when this is going to happen or has happened.
Thanks

DavidTurner · March 17, 2020, 3:10pm

It depends a bit on exactly how the cluster is overloaded, but usually docs that couldn't be indexed due to overload would result in a 429 on those specific docs, even if the top-level response is 200.

The only way to get a 200 status for a doc is if it is written successfully to all in-sync copies (primary plus replicas) so those won't be lost. I think there may be something wrong in how you're detecting document-level failures.

lzambarda · March 17, 2020, 3:15pm

Thanks. I will double check if the indexer is missing some 429. In that case I will retry the request and also throttle subsequent requests.

DavidTurner · March 17, 2020, 3:19pm

Note that it's not just 429 that you should handle specially. Anything except 200 indicates a problem during indexing. I think other 4xx codes should not be retried since they indicate something is fundamentally wrong with the request, but maybe 5xx codes can be retried since they may be transient server-side issues.

lzambarda · March 17, 2020, 3:20pm

Makes perfect sense. Thank you again!

system · April 14, 2020, 3:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk Request not fully indexing data in synchronised request Elasticsearch	1	759	January 4, 2017
Elasticsearch bulk insertion issue: 403 request throttled due to too many requests Elasticsearch	17	2826	April 16, 2021
Bulk index API back pressure Elasticsearch	9	4960	July 5, 2017
Elasticsearch bulk index missing some records Elasticsearch	18	3755	August 2, 2018
Issue Indexing 50mil Docs via Bulk API Elasticsearch	23	2365	July 5, 2017

Request volume management

Related topics