Fix to libbeats to split bulk requests which are too large, not working in Elastic Agent 8.9.0?

Craig_Rodrigues · September 1, 2023, 4:10pm

In Beats, I see that this commit was merged in March 2023:

" Split large batches on error instead of dropping them"
PR 34911

I think that PR 34911 changed the Publish() logic:

If I look here:

elastic/beats/blob/1a7e9aa9819073e9be5112310d9d419ca1458710/libbeat/outputs/elasticsearch/client.go#L187


      
          			ConnectionSettings: connection,
          			Index:              client.index,
          			Pipeline:           client.pipeline,
          			NonIndexableAction: client.NonIndexableAction,
          		},
          		nil, // XXX: do not pass connection callback?
          	)
          	return c
          }
          
          func (client *Client) Publish(ctx context.Context, batch publisher.Batch) error {
          	events := batch.Events()
          	rest, err := client.publishEvents(ctx, events)
          
          	switch {
          	case errors.Is(err, errPayloadTooLarge):
          		if batch.SplitRetry() {
          			// Report that we split a batch
          			client.observer.Split()
          		} else {
          			// If the batch could not be split, there is no option left but

func (client *Client) Publish(ctx context.Context, batch publisher.Batch) error {
	events := batch.Events()
	rest, err := client.publishEvents(ctx, events)

	switch {
	case errors.Is(err, errPayloadTooLarge):
		if batch.SplitRetry() {
			// Report that we split a batch
			client.observer.Split()
		} else {
			// If the batch could not be split, there is no option left but
			// to drop it and log the error state.
			batch.Drop()
			client.observer.Dropped(len(events))
			err := apm.CaptureError(ctx, fmt.Errorf("failed to perform bulk index operation: %w", err))
			err.Send()
			client.log.Error(err)
		}
		// Returning an error from Publish forces a client close / reconnect,
		// so don't pass this error through since it doesn't indicate anything
		// wrong with the connection.
		return nil
	case len(rest) == 0:
		batch.ACK()
	default:
		batch.RetryEvents(rest)
	}
	return err
}

So if my understanding of the logic is correct,
this function should try to split up a bulk request
and only fail if it cannot successfully split up the bulk request and send.

I am running elastic-agent 8.9.0 , and I do not seem
to see this behavior. I am getting hundreds of errors like the one I posted here:

A few questions, maybe @faec @Lee_Hinman @cmacknz can help:

Does elastic-agent 8.9.0 have the fix in PR 34911 ?
Is this fix working properly?
How do I tell how large the payload that libbeat is trying to send to elasticsearch via a POST _bulk API call?
Does libbeat ignore the setting of http.max_content_length on the server?

Any help you can give to help shed light on this,
and solve my problem would be greatly appreciated!

cmacknz · September 6, 2023, 8:33pm

Does elastic-agent 8.9.0 have the fix in PR 34911 ?

Yes it does, you can see the tags that include it on the merge commit Split large batches on error instead of dropping them (#34911) · elastic/beats@df59745 · GitHub

Is this fix working properly?

It doesn't appear to be having any effect here, and from what you've posted I suspect the reason might be that Elastic Agent isn't actually seeing the HTTP 413 response and is instead seeing write tcp [redacted]:33430->[redacted]:443: write: broken pipe as the error. The new code won't do anything unless it explicitly receives a 413 from Elasticsearch.

How do I tell how large the payload that libbeat is trying to send to elasticsearch via a POST _bulk API call?

Ideally by getting a 413 response back

Does libbeat ignore the setting of http.max_content_length on the server?

Yes because it doesn't know about it. Libbeat does not query this parameter and Elasticsearch does not give it back.

Craig_Rodrigues · September 7, 2023, 6:30pm

Are you sure that Elasticsearch does not give this parameter back?

If I do:

[GET _cluster/settings?flat_settings=true&include_defaults=true]( Cluster get settings API | Elasticsearch Guide [8.9] | Elastic

I see a bunch of setings, including:

 "http.max_content_length": "100mb",

If this parameter is queryable from Elasticsearch, would it be possible to
change elastic-agent / beats to query this parameter, and adjust the size of the
payload being sent by POST _bulk calls?

cmacknz · September 7, 2023, 6:45pm

Yes sorry, I didn't mean it was impossible to query I meant it wasn't part of the _bulk response. I suppose we could query it but the batch splitting implementation was supposed to make that unnecessary. If we don't always get 413 responses back when a batch is too large it would make sense for us to do that.

Craig_Rodrigues · September 7, 2023, 7:21pm

No worries!
Should I submit an enhancement request for elastic-agent / beats to query http.max_content_length from Elasticsearch so that this ca be used as part of the batch splitting implementation?
Either somewhere in GitHub, or via https://support.elastic.co?

cmacknz · September 7, 2023, 7:34pm

I would do both, create a public GitHub issue in Beats and also raise it through support. Going through support in addition to creating an issue will help with prioritization.

Craig_Rodrigues · September 8, 2023, 8:45pm

I have submitted:

Enhance logic for splitting large payload requests by querying http.max_content_length from Elasticsearch · Issue #36534 · elastic/beats · GitHub
Enhancement Request: #19624

system · October 6, 2023, 10:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat error Failed to publish events Beats filebeat	7	3299	April 25, 2018
Filebeat "failed to perform any bulk index operations" Beats filebeat	1	112	December 10, 2024
Libbeat does not allow updates when publishing to Elasticsearch? Beats beats-development	5	773	October 30, 2018
Beats error: POST _bulk EOF after upgrade from 6.5.4 to 7.2.0 Beats	1	1138	July 28, 2019
Pipeline/output.go:121 Failed to publish events: temporary bulk send failure Elasticsearch	2	596	October 3, 2020

Fix to libbeats to split bulk requests which are too large, not working in Elastic Agent 8.9.0?

Related topics