Metricbeat - Capable of following?

I would appreciate if someone could answer the following regarding MetricBeats =>
1.) Does MetricBeat support secure data transmission?
2.) Regarding stability, is there any built-in agent fail-over?
3.) Is there a known supported data volume or break point (e.g. 10,000 events, etc.)
4.) What is in place for data buffering? Is data queued for processing?

Thanks!

  1. Yes
  2. Yes, if multiple hosts are defined and one becomes unresponsive it will move on to the next.
  3. No known rate. Usually the limiting factor is the ingest rate on the receiving side. So capacity planning and stress testing can reduce risk.
  4. For Metricbeat some number of events can be buffered in memory. Once the queue fills up (due to the output not being available or being too slow) new data collection will stop until space frees up in the queue. There has been some talk of spooling events to disk so that collection can continue. In addition all Beats support sending to an external queue like Redis or Kafka.

Great info, thank you very much for the quick response.

I had a few follow up questions please as we get closer to making a decision on our metric collection agent........
1.) Is it possible to add additional (custom) tags to Beats metrics? (e.g. ApplicationID field)
2.) Does Beats have any dependencies required for install? Our "build team" is inquiring about this item.
3.) Does Beats agent have a built in heartbeat mechanism? Trying to determine most effective method to determining up/down status of agent.

Thanks!

  1. Yes, tags can be applied globally or per module. fields and fields_under_root can also be used to add arbitrary fields to your events.
  2. There are no dependencies.
  3. There is no heartbeat built in, but the absence of any events from a host could be an indicator. We are planning to add a centralized monitoring capability to all Beats where metrics and health/status info will be available in X-Pack Monitoring (under the free Basic license). Logstash is being added to monitoring now and then Beats will be next. Some people are using the undocumented -httpprof endpoint to keep an eye on Beats now, but that interface will probably change with centralized monitoring (see #463).

Thank you for your quick response Andrew! I will try to add the fields later today or tomorrow. I did have a couple of related questions if I could please.
1.) Regarding the adding of centralized monitoring to Beats, do you have a rough estimate on the timeline for this?
2.) Regarding the collection of diskio metrics, is the expectation that a single "time" metric will be expected for each disk? For example, "metricset":{"module":"system","name":"diskio"},"system":{"diskio":{"io":{"time":183},"name":"xvdal1","read":{"bytes":1155072,"count":402,"time:183},"write":{"bytes":0,"count":0,"time":0}}},"type":"metricsets"}

Thanks!

  1. It's probably a 6.0 feature, but maybe 5.3.
  2. The is a single event/document generated for each disk. The event contains three times -- read, write, and total (all in ms). And IIRC all of the values are counters so the data is cumulative.

Ok, as I am brand new to Beats suite, I am unfamiliar with the release schedule. If 5.0 is current version, when can 5.3 be expected, 6.0, etc. For example, is there one release per/month or per/quarter?

5.3 is probably around March. 6.0 - probably fall.

ok, just want to make sure we are talking about the same thing. For 'centralized monitoring' functionality, this is for agent management I'm assuming. What functionality is currently planned? Will this be a part of a larger implementation of a agent configuration or agent management configuration tool (e.g. where agents can be restarted, etc.)? Maybe something similar to Rocana Ops dashboard where hostname, config profile, config profile, etc can be viewed)? Or is what I'm referring to part of a paid for "managed" solution?

Also, what do the majority of your large customers use as the mechanism for managing/upgrading agent versions since you average a release per month?

Thanks Andrew!!

The centralized monitoring that I spoke of earlier will be a "read-only" feature. It won't do management of the agents. Centralized configuration has been discussed, but the feature is less clear in scope, schedule, and licensing to me. I can check with the product management team.

Most orgs are using one of the configuration management tools like Ansible, Puppet, or Chef to deploy the software and configure it.

Is there any documentation regarding stability use cases? As we prepare to roll this out to enterprise, there are a number of stability scenarios where we want to monitor agent under certain stress conditions. Not sure i can test all these in our environment so wanted to see if you had any data/info on how the MetricBeat agent behaves in the following situations:

No Configuration
Bad Configuration (incorrectly formed config for example)

Log Path Unavailable (*using FileOutput option and AWS Kinesis Agent to send stream)
Log File not found (variation on the above but different)
No access to log path

Stream Endpoint Unavailable
No access to stream endpoint
Stream not found

Event line exceeds buffers (can we create a massive string in a text file for this test – so it exceeds the jvm configured max for example)
Event line cannot be read (Corrupt, badly formed etc..)

Local buffer gets too full (this when the stream is down)
Local (to disk) buffer unavailable (when in memory buffer has to spool to disk, but disk location isn’t there)

  • No Configuration - Fails to start. Exit code 1.

  • Bad Configuration - Fails to start. Exit code 1.

  • Log Path Unavailable - Tries to create the path if it does not exists. Fails if it can't create the dir or file. Exit code 1.

  • Log File not found - Creates the file if it does not exists.

  • No access to log path - Fails on startup.

  • Stream Endpoint Unavailable - Metricbeat buffers events in memory. When the buffer is full (based on a configurable number of events) it stops collecting new events. It continuously keep trying endpoints.

  • No access to stream endpoint - Same as previous.

  • Stream not found - Same as previous.

  • Event line exceeds buffers (can we create a massive string in a text file for this test – so it exceeds the jvm configured max for example) - This sounds like a Filebeat question. It doesn't seem to make sense in the context of Metricbeat. Could you elaborate.

  • Event line cannot be read (Corrupt, badly formed etc..)

  • Local buffer gets too full (this when the stream is down) - Answered above.

  • Local (to disk) buffer unavailable (when in memory buffer has to spool to disk, but disk location isn’t there) - Spooling to disk isn't a feature yet. https://github.com/elastic/beats/issues/575

Thanks for the quick response Andrew. Yes, my bad on not pointing out we are also simultaneously performing a POC on FileBeat. So if you could answer as it applies to FileBeat, that would be greatly appreciated!

Event line exceeds buffers (can we create a massive string in a text file for this test – so it exceeds the jvm configured max for example)

I guess I should ask before we get any further. We are using the FileOutput mechanism in combination with the AWS Kinesis Stream stand-alone Java agent to send data to AWS Kinesis streams. In the docs, I saw the following verbiage "Currently, this output is used for testing". Is this not meant to serve as a reliable production configuration for delivery? Also, is there a way to configure MetricBeat to send data to an AWS Kinesis stream directly? I see it is possible to send to Kafka, Logstash, and others.

Ok, so for Filebeat

  • Event line exceeds buffers (can we create a massive string in a text file for this test – so it exceeds the jvm configured max for example) - Filebeat has a max_bytes setting and if a line exceeds this it gets dropped.

  • Event line cannot be read (Corrupt, badly formed etc..) - I think this depends on the encoding type you have configured. But let's see. @ruflin @steffens Could you answer this one?

I thought you meant you were using the logging file output in Metricbeat and then ingesting the logs to Kinesis.

So you are using the file output to write the JSON to disk then sending it to Kinesis via the Kinesis agent? The file output should work but Kinesis cannot acknowledge to the Beat that it has read the data so back-pressure is never applied to the Beat.

No.

Yes, for Metricbeat, we have configured output.file with a local path to dump metric data. Then the AWS Kinesis stream agent requires two parameters =>

We are leveraging AWS native components in lieu of building separate Kafka, ELK stack, etc. to simplify security, compliance, etc. If there is a better or more efficient method to configure Metricbeat given the setup I outlined, I would welcome any input or suggestions. Thanks!!!

filebeat reads a line until it detects a newline. Unfortunately only after a complete line has been read (as encodings do complicate the matter a little), will line limits in bytes be applied. That is, filebeat will buffer the complete line, but only report first N lines (default 10MB I think). Multiline events do have a total content limit as well (here the limit is applied while reading).

Unfortunately text codecs are quite dumb in general (no encoding information in file, so you just have to know which one to use). That is, if you've configured the wrong codec, beats will attempt the transformation (normally not erroring, but returning garbage) and send the event as has been reported by the codec (This is a general problem to text encoding, e.g. try opening the file in editor with wrong encoding).

Hello Andrew, are metricbeat and filebeat capable of client-side encryption? Follow up....if so, is there an out-of-the-box method/tool to achieve or is it up to the user to modify the open-source code to provide functionality? Thank you, Rob Herring