Nest BulkAll - Error: 'BulkAll halted after receiving failures that can not be retried from _bulk'

dumstattd · May 17, 2022, 1:50pm

I am currently using a foreach to index through a pipeline using the nest client. This is slow and highly inefficient. But works.

For that same dataset, I tried using the BulkAll command as follows to push the same data through the same pipeline during indexing. However, this results in errors.

Error: "BulkAll halted after receiving failures that can not be retried from _bulk"

Looking at the reason it says its a 201 status code which to me means created. And I see data showing up in the index as well.

Here is the code I am using to push the data into the index through the pipeline.

    public void IngestViaPipeline<T>(IEnumerable<T> data, string indexName, string pipelineName) where T : class {
        var pageSize = 1;
        var bulkAllObservable = _client.BulkAll(data, b => b
            .Index(indexName)
            .Pipeline(pipelineName)
            .BackOffRetries(10)
            .BackOffTime("30s")
            .RefreshOnCompleted()
            .MaxDegreeOfParallelism(Environment.ProcessorCount)
            .Size(pageSize)
            .RetryDocumentPredicate((item, doc) => { return true; })
            .DroppedDocumentCallback((item, doc) => {
                _logger.LogError($"Could not index doc.{Environment.NewLine}{item}{Environment.NewLine}{System.Text.Json.JsonSerializer.Serialize(doc)}");
            })
        );

        var waitHandle = new ManualResetEvent(false);
        ExceptionDispatchInfo exceptionDispatchInfo = null;

        var observer = new BulkAllObserver(
            onNext: response => {
                _logger.LogDebug($"Bulk Indexing Page: {response.Page} out of {data.Count() / pageSize} for Index: {indexName} using Pipeline: {pipelineName}");
            },
            onError: exception => {
                exceptionDispatchInfo = ExceptionDispatchInfo.Capture(exception);
                waitHandle.Set();
            },
            onCompleted: () => waitHandle.Set()
        );

        bulkAllObservable.Subscribe(observer);
        waitHandle.WaitOne();
        exceptionDispatchInfo?.Throw();
    }

I have looked at the code long enough and tried about everything I can find in the docs about this command. So I am sure it is fatigue letting me miss the critical thing. What can I do differently to pinpoint the cause? Is there anything I am doing incorrectly here?

stevejgordon · May 17, 2022, 2:03pm

Hi @dumstattd. I think you're missing .ContinueAfterDroppedDocuments() in your BulkAll configuration which enable the continuation after any failed documents. Those failures should then invoke the callback.

dumstattd · May 17, 2022, 7:33pm

Hello @stevejgordon That did not seem to do the trick. After that I did not get anymore errors reported than before.

Error shown:

MyService[0]
      Elasticsearch.Net.ElasticsearchClientException: Bulk indexing failed and after retrying 10 times
         at Nest.BulkAllObservable`1.BulkAsync(IList`1 buffer, Int64 page, Int32 backOffRetries)
         at Nest.BulkAllObservable`1.RetryDocuments(Int64 page, Int32 backOffRetries, IList`1 retryDocuments)

I did add in more logging within RetryDocumentPredicate in order to show the item and the doc value. The Item is as such.

Retrying to index doc.
      index returned 200 _index: myindex _type:  _id: 1 _version: 14 error:

The data itself I can send using IndexManyAsync and it does not have a problem with it. But when sending through BulkAll it seems to not accept the document.

dumstattd · May 17, 2022, 8:33pm

Does the json serializer change between bulk and IndexManyAsync? I can take that exact document and post it to the index using the pipeline through VSCode Elastic Client and it will only work when using camel case. Does bulk not send via camel case?

dumstattd · May 18, 2022, 3:45pm

I just flipped it to use Index instead of bulkall. With a foreach wrapping it and it works just fine using the same exact data. So then I flipped over to using Bulk instead of BulkAll and put in my own paging method to page through 1000 record chunks. Again works flawlessly. So too me it starts to either point to a bug in BulkAll or that the documentation is incorrect/incomplete.

system · June 15, 2022, 3:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Martijn_Laarman · August 23, 2022, 9:47am

@dumstattd I think this is indeed a bug and I think I've found the cause.

Created the following issue to explain and track the failure.

github.com/elastic/elasticsearch-net

BulkAll should classify retried and failed documents as dropped.

opened 09:46AM - 23 Aug 22 UTC

Mpdreamz

bug 8.x

BulkAll has an option to continue after dropped documents: https://github.com/el…astic/elasticsearch-net/blob/b7dd9cf58c67cccc6d7f679b5883a155e112f5e2/src/Elastic.Clients.Elasticsearch/Helpers/IBulkAllRequest.cs#L17 HandleDroppedDocuments takes this setting into account and only throws an exception if we are not continueing after seeing dropped documents https://github.com/elastic/elasticsearch-net/blob/584edb8b4e8a9ec47bbb53d4f3c1a3a4c80db8f1/src/Elastic.Clients.Elasticsearch/Helpers/BulkAllObservable.cs#L173 However later after retrying we through if we still hold retryableDocuments. This should also check this configuration setting and treat them as dropped documents: However later we don't classify retried to completion bulk items as dropped and do not make this check conditional: https://github.com/elastic/elasticsearch-net/blob/584edb8b4e8a9ec47bbb53d4f3c1a3a4c80db8f1/src/Elastic.Clients.Elasticsearch/Helpers/BulkAllObservable.cs#L178

Topic		Replies	Views
Elasticsearch.Net.ElasticsearchClientException: BulkAll halted after failed product check Elasticsearch language-clients	3	1100	August 15, 2022
Intermittent error while ingesting files in pipeline Elasticsearch language-clients , runtime-fields	1	223	April 1, 2024
Nest BulkAll indexing duplicates Elasticsearch	1	801	November 9, 2017
Bulk indexing failed and after retrying 2 times. at Nest.BulkAllObservable`1.<BulkAsync>d__20.MoveNext() Elasticsearch language-clients , runtime-fields	1	401	August 25, 2023
BulkAllResponse and BulkResponse is not giving failed records Elasticsearch language-clients	1	244	May 26, 2022

Nest BulkAll - Error: 'BulkAll halted after receiving failures that can not be retried from _bulk'

Related topics