Export monitoring metrics of a *beat via HTTP(S)


(Dennis Günnewig) #1

Hey there,

I'm currently working on a beat. To make it work well in an environment with quite a few logs, I would like to gather as much metrics as possible and expose them to "Prometheus". I saw people working on sending metrics to an Elasticsearch cluster, but in my environment I use "Prometheus" to monitor services. I would like to use those metrics to "improve" my setup: I see elasticsearch consuming a lot of CPU and memory.

Logs are sent via TCP linewise (EOL: \r\n) to a TCP-server, which parses the lines it receives using "grok" and publishes them to Elasticsearch via libbeat.

[Appliance] -> TCP -> [TCP-Server] | [libbeat] -> Elasticsearch

Questions:

  • Is there a public API I can use to get access to captured metrics (including those generated by libbeat)?
  • Are there any guidelines to access metrics?
  • Is there any documenation about how to use those metrics?

Metrics:

  • Count of lines received via TCP
  • Total bytes received via TCP
  • Duration till published
  • Count of lines parsed
  • Count of events in queue before published
  • Count of events published by libbeat
  • ...

(Steffen Siering) #2

Monitoring/metrics support is still in the making. For defining metrics, we use the libbeat/monitoring package. Monitoring includes some "pluggable" reporters available via libbeat/monitoring/reporter.

When enabling profiling via -httpprof or standalone (without profiling available) http endpoint, one can query the metrics via HTTP. The handler for http://host:port/debug/vars reports both, expvar and libbeat/monitoring metrics. The http endpoint reports libbeat/monitoring metrics via http://host:port/stats only.


(Dennis Günnewig) #3

@steffens Thanks for your help!

  1. What means "in the making". Is the API of the libbeat/monitoring package stable?
  2. libbeat/monitoring/reporter is part of the 6.0.0 API, correct? I don't see it in the 5.5 branch.
  3. Is there any documentation available - besides this? I read through the sources before submitting my first post to discuss.elastic.co and felt lost.
  4. Are there any examples available in the wild besides the libbeat codebase?
  5. Would you consider libbeat/monitoring/reporter suitable for implementing a prometheus reporter - as it uses a pull rather than a push model?
  6. Is this kind of monitoring available for "customers" using the open source version only? Or do you need to have a valid x-pack-license (at least basic)?

(Steffen Siering) #4
  1. In "the making" means, not documented, not widely used and things are subject to change. The monitoring support for beats->ES is not finished and you're getting into still-in-development territory here. It's some code in master/6.x branch, but not used for reporting metrics yet. If we find requirements in reporting to change, we might have to do some adaptations. But the way metrics/registries are generated in libbeat/monitoring should be mostly stable.

  2. None of the new metrics code is available in the 5.x branches.

  3. no documentation yet. Maybe some more cleanup will follow. A metric is anything that satisfies the Var interface. That is, we don't use interface{} or strings for metrics, but a metric type must be able to report itself to the Visitor interface. This allows us to restrict the types/capabilites a reporter/snapshot has to deal with. A beats developer only requires the types Registry, Float, Int, Uint, Var and FuncVar (for custom metrics, for example see libbeat mem metrics). You can think of a registry to be an json object and the actual variables being the object fields. Sub-registry/values can be added removed at any time.
    The Default variable contains the top-level default registry. For reporting we have some helpers to create a snapshot of the complete registry tree (CollectFlatSnapshot, CollectStructSnaphost). These helpers use internal types to implement the Visitor interface and thusly capture the complete registries content.
    More advanced metrics like histograms are not provided yet.
    Metrics can also have mode (something like a log level), which is used to filter metrics when building a snapshot. This is currently not used and we just report all metrics. But we might use this to filter for 'important' metrics on snapshot time, to reduce amount of data to be published/indexed in ES. The ReportX functions are helpers for custom metrics.
    PRs on improving docs are very welcome :slight_smile:

Seeing the godoc, it's a many types/functions. Maybe at some point, some helpers and the snapshort support will be moved into separate packages, to clear things up a little.

  1. In the wild? you can grep for libbeat/monitoring in the other beats. Almost all metrics have been switched to the new package.

  2. A reporter is basically a 'service' in libbeat. All you need to provide is a constructor and Stop method. See Reporter interface. A reporter will create a snaphost on its own and report the snapshot however it seems to be appropriate. e.g. a Pull-based reporter might cache the last snapshot with a configurable timeout. This way a 'fast' or multiple pull service will not generate too many snapshots. But I don't know how hard it will be to integrate the reporter with the prometheus client or if you are better of implementing the HTTP service yourself. E.g. using monitoring.Default.Get you can get the variable by full name (registry names are separated by dots). But if the variable is no Registry type and is going to report a complete object, you will have to use the visitor interface to extract the exact metric you are interested in.

  3. The metrics and HTTP endpoints are in the open source. But the actual reporter will work with x-pack monitoring in ES and Kibana to provide the common UI, integrated with the other services already being integrated with x-pack monitoring support. As monitoring support is still in progress, I don't think this will be available with 6.0 release.


(Dennis Günnewig) #5

Thanks a lot! I used your information and was able to build a prometheus instrumentation with this.

  1. I re-used code from the existing http Handler
  2. I rebuild the http Handler to output the metrics in a format prometheus can understand - without any caching at all as prometheus should always get the current values.

What I would love to see:

  • Add more metadata to metrics which can be used in arbitrary reporters - description of metric, type of metric
  • Build the monitoring API in a way which makes it possible for beat authors to add monitoring at their choice - e.g. add support for prometheus - though I understand that this product is kind of a competitor for you own ones

Does it make sense to put that into an Github issue?

Thanks again for your help!


(Steffen Siering) #6

We normally document exported fields in fields.yml and check in system tests these fields are actually present. Having/adding documentation in line might bloat the API even more and is currently not of much use.

without any caching at all as prometheus should always get the current values

True. It's the same for the current http handler. Still caching can help dealing with rogue collectors (well... adding authentication also helps).


(system) #7

This topic was automatically closed after 21 days. New replies are no longer allowed.