Description of metrics exposed by /stats endpoint

trondhindenes · March 30, 2018, 8:18am

now that filebeat has a metrics endpoint (https://github.com/elastic/beats/pull/3717) we're trying to set up some rudimentary monitoring of our filebeat agents running on all nodes.

Here's an example result that I pulled from one of our dev env servers:

gist.github.com

https://gist.github.com/trondhindenes/fef8f5646f68db465172284bfa8fc38d

filebeat_stats_output.json

    {
        "beat": {
            "cpu": {
                "system": {
                    "ticks": 62,
                    "time": 62
                },
                "total": {
                    "ticks": 155,
                    "time": 155,

This file has been truncated. show original

What we're after is looking for situations where filebeat is unable to successfully send data to Logstash, or any other situations where data for some reason isn't sent. I guess the 'pipeline.events' object contains the data that I'm after, but I don't know what each of the metrics actually means.

Anyone able to help me figuring out what we should look for? The plan is to write a simple Datadog agent plugin that exposes the necessary metrics, and set up alerting from there.

pierhugues · March 30, 2018, 12:48pm

@trondhindenes Did you take a look at monitoring beats with x-pack? Beats monitoring comes with x-pack basic which will send all these stats to Elasticsearch, all the stats are stored into indices so you can query them.

We plan to add alerting in future versions, we will throw an alert if we didn't receive stats for a period of time and do heuristic on the rates of events. But that It could be done externally.

trondhindenes · March 30, 2018, 8:00pm

Hi Pier,
We don't use the paid version of x-pack so alerting isn't available to us. We use Elasticsearch solely for log ingestion and other tools for infrastructure monitoring.

TBH, the filebeat rest endpoint seems to do more than enough of what we need, I just need to figure out what the exposed metrics actually mean. I realize the stats endpoint is in a pre-release state so I guess "formal" documentation isn't in place yet, but looking at the similar Logstash monitoring api those metrics never got properly documented either, so that's why I'm asking.

trondhindenes · March 30, 2018, 8:02pm

Edit: I see now that x-pack basic includes something called "full-stack monitoring" and I assume filebeat monitoring is part of that. Tho it looks like one needs the paid version to get alerting (I honestly don't know if I'd call monitoring without alerting "monitoring"), but anyways: We'd still like to tap into the stats api directly from our "regular" monitoring tooling.

ruflin · April 3, 2018, 9:01am

I'm glad the data is useful for you. One of the reasons we didn't document is the field yet as the structure of the events will change slightly.

For the naming we "try" to aim to make it as self explanatory as possible which is not always easy. If you have 3-4 metrics which you are interested, happy to explain them here.

For the metrics you asked above: We had quite a few discussions in the past that it's an issue which is tricky to track as there are several metrics influenced by it and we need to improve. I thought there is also a github issue on it but couldn't find it

trondhindenes · April 3, 2018, 12:15pm

Thanks @ruflin.

Ultimately we just want to answer "can we send stuff to logstash?" I guess the way to answer that is to look at the size of Filebeat's "queued" messages and the frequency of errors from trying to send. Are these somehow exposed directly or indirectly? Its not clear to me looking at the json response what counter(s) I should be monitoring.

ruflin · April 4, 2018, 12:51pm

There are different error scenarios. The metrics you are probably most interested in are the output.events.failed counter (https://gist.github.com/trondhindenes/fef8f5646f68db465172284bfa8fc38d#file-filebeat_stats_output-json-L69) and the pipeline.events.failed counter. If these numbers keep growing, probably something is wrong with your connection. There is also a retry counter which you can keep an eye on.

I think it's important to mention that Filebeat can overload Logstash and will automatically backoff. This will increase the queue size but I'm not sure if we have all the metrics there yet. At least in your json not enough queue stats showed up.

system · May 2, 2018, 2:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat Stats and Metrics Beats elastic-stack-monitoring , elastic-stack-alerting , filebeat	2	2704	May 7, 2020
Using Metricbeat to send Filebeat logs to ES Beats metricbeat	7	543	June 26, 2023
Question about filebeat monitor metrics Beats filebeat	1	301	October 8, 2018
Monitor/alert beats dropped events Beats filebeat , metricbeat	9	1399	November 4, 2022
Metricbeat data differs greatly from Elasticsearch Endpoint Beats metricbeat	3	306	March 26, 2020

Description of metrics exposed by /stats endpoint

Related topics