In Beats, I see that this commit was merged in March 2023:
" Split large batches on error instead of dropping them" PR 34911
I think that PR 34911 changed the Publish() logic:
If I look here:
func (client *Client) Publish(ctx context.Context, batch publisher.Batch) error {
events := batch.Events()
rest, err := client.publishEvents(ctx, events)
switch {
case errors.Is(err, errPayloadTooLarge):
if batch.SplitRetry() {
// Report that we split a batch
client.observer.Split()
} else {
// If the batch could not be split, there is no option left but
// to drop it and log the error state.
batch.Drop()
client.observer.Dropped(len(events))
err := apm.CaptureError(ctx, fmt.Errorf("failed to perform bulk index operation: %w", err))
err.Send()
client.log.Error(err)
}
// Returning an error from Publish forces a client close / reconnect,
// so don't pass this error through since it doesn't indicate anything
// wrong with the connection.
return nil
case len(rest) == 0:
batch.ACK()
default:
batch.RetryEvents(rest)
}
return err
}
So if my understanding of the logic is correct,
this function should try to split up a bulk request
and only fail if it cannot successfully split up the bulk request and send.
I am running elastic-agent 8.9.0 , and I do not seem
to see this behavior. I am getting hundreds of errors like the one I posted here:
It doesn't appear to be having any effect here, and from what you've posted I suspect the reason might be that Elastic Agent isn't actually seeing the HTTP 413 response and is instead seeing write tcp [redacted]:33430->[redacted]:443: write: broken pipe as the error. The new code won't do anything unless it explicitly receives a 413 from Elasticsearch.
How do I tell how large the payload that libbeat is trying to send to elasticsearch via a POST _bulk API call?
Ideally by getting a 413 response back
Does libbeat ignore the setting of http.max_content_length on the server?
Yes because it doesn't know about it. Libbeat does not query this parameter and Elasticsearch does not give it back.
If this parameter is queryable from Elasticsearch, would it be possible to
change elastic-agent / beats to query this parameter, and adjust the size of the
payload being sent by POST _bulk calls?
Yes sorry, I didn't mean it was impossible to query I meant it wasn't part of the _bulk response. I suppose we could query it but the batch splitting implementation was supposed to make that unnecessary. If we don't always get 413 responses back when a batch is too large it would make sense for us to do that.
No worries!
Should I submit an enhancement request for elastic-agent / beats to query http.max_content_length from Elasticsearch so that this ca be used as part of the batch splitting implementation?
Either somewhere in GitHub, or via https://support.elastic.co?
I would do both, create a public GitHub issue in Beats and also raise it through support. Going through support in addition to creating an issue will help with prioritization.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.