What does Output Errors mean in Beats monitoring in Kibana?

Hello,

In the Stack Monitoring page in Kibana, in the beats section, there is a column called Output Errors. What is this about? I can't see errors in the logs of the remote beats that are being monitored.

If it is about events/logs not being written due to an error when connecting to elasticsearch, does this mean that the logs get lost? Will their sending be retried until success?

Here are the definitions of the beats monitoring fields.

However I am with you ...even here the definitions are a bit vague
Perhaps someone from the beats team will add some perspective.

@jsoriano

What is the difference between these.


beat.stats.libbeat.output.write.errors

beat.stats.libbeat.output.events.dropped

beat.stats.libbeat.output.events.failed

Thanks @stephenb . To add to the available info, when you click a specific beat agent in the kibana interface, there are some info ballons that mention:

For the "Fail rate" metric:

Interval: 10 seconds.
Failed in Pipeline: Failures that happened before event was added to the publishing pipeline (output was disabled or publisher client closed).
Dropped in Pipeline: Events that have been dropped after N retries (N = max_retries setting).
Dropped in Output: (Fatal drop) Events dropped by the output as being "invalid." The output still acknowledges the event for the Beat to remove it from the queue..
Retry in Pipeline: Events in the pipeline that are trying again to be sent to the output

And in the "Output errors" metric:

Interval: 10 seconds.
Sending: Errors in writing the response from the output.
Receiving: Errors in reading the response from the output

From the above, I feel that I am safe if the Fail Rate is zero. I don't understand what the output error is though and how it could potentially relate to these fail rates. Looking forward to a reply from @jsoriano.

1 Like

Thanks @tterranigma

I did not realize those info balloons were there :slight_smile:

Hey,

I agree that this is not very clear in the docs, even after looking at the code I am not 100% sure of the meaning, but let me try to explain :slight_smile:

beat.stats.libbeat.output.write.errors are low-level errors in the underlying http request or tcp connection. These errors are probably harm-less if there are no failed or dropped events, but a high number may indicate that there is some kind of issue in the network or in the output cluster, or that there is some kind of congestion somewhere.

beat.stats.libbeat.output.events.failed indicates a higher-level failure at the output level, this means that the output hasn't been able to confirm if an event has been written, and it will be probably retried. So in general, they are transient failures that shouldn't lead to data loss.

beat.stats.libbeat.output.events.dropped are dropped events, they are lost for sure. This uses to indicate that the beat is sending events that cannot be indexed. This uses to be a bug in Beats, or some kind of misconfiguration or weird setup. Logs may help to identify the culprit when they happen.

2 Likes

I have created an issue to clarify this in the docs https://github.com/elastic/beats/issues/27763

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.