Dec 3rd, 2023: [EN] Serverless Observability: How Beats alerts help you save Christmas

This post is also available in Espagnol.
This post is also available in Romanian.

Winter holidays are near and as some of us are cozying next to the fire or looking for the calm breeze of the sea, we have one thing in common: to have peaceful holidays. Now as a Site Reliability Engineer I totally subscribe to a calm and smooth on-call. This short piece is my peace offering to you as it will help you dive in the Serverless observability, whether you have it fully implemented or not.

In previous articles on the Elastic blog and advent calendars, you have seen how Elastic Agent is deployed and integrated with Kubernetes. Perhaps you have gone a bit further as well and got curious on Kubernetes alerting on Elastic Observability. Kudos to you! Now my SRE/elf job here is to enlighten you just a bit more and show you a trick on how alerting on the beats queue will help you ace that list of Serverless alerts.

Now you probably want to dive more in the realm of beats and migrate to Elastic Agent. My advice to you: pay attention when those beats queues start piling up.

Why are these important? Filebeat collects a snapshot of metrics about itself every 30 seconds by default. If this snapshot contains any metrics, the snapshot is then serialised as JSON and emitted in Filbeat’s logs at the INFO log level.

For example:

{"log.level":"info","@timestamp":"2023-07-14T12:50:36.811Z","log.logger":"monitoring","log.origin":{"file.name":"log/log.go","file.line":187},"message":"Non-zero metrics in the last 30s","service.name":"filebeat","monitoring":{"metrics":{"beat":{"cgroup":{"memory":{"mem":{"usage":{"bytes":0}}}},"cpu":{"system":{"ticks":692690,"time":{"ms":60}},"total":{"ticks":3167250,"time":{"ms":150},"value":3167250},"user":{"ticks":2474560,"time":{"ms":90}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":32},"info":{"ephemeral_id":"2bab8688-34c0-4522-80af-db86948d547d","uptime":{"ms":617670096},"version":"8.6.2"},"memstats":{"gc_next":57189272,"memory_alloc":43589824,"memory_total":275281335792,"rss":183574528},"runtime":{"goroutines":212}},"filebeat":{"events":{"active":5,"added":52,"done":49},"harvester":{"open_files":6,"running":6,"started":1}},"libbeat":{"config":{"module":{"running":15}},"output":{"events":{"acked":48,"active":0,"batches":6,"total":48},"read":{"bytes":210},"write":{"bytes":26923}},"pipeline":{"clients":15,"events":{"active":5,"filtered":1,"published":51,"total":52},"queue":{"acked":48}}},"registrar":{"states":{"current":14,"update":49},"writes":{"success":6,"total":6}},"system":{"load":{"1":0.91,"15":0.37,"5":0.4,"norm":{"1":0.1138,"15":0.0463,"5":0.05}}}},"ecs.version":"1.6.0"}}

The following field .monitoring.metrics provides you with specific information on either .beat, .libbeat or .filebeat objects. For the sake of this demonstration, we will consider the .libbeat object as being the most common to all Beats and use it in our Serverless alerting definition.

The .libbeat object comes with several flavours that will help you customise per your needs unto this alert. I have picked myself a winner and this is .pipeline.events.active. Ultimately this will show me that if the number of events keeps growing I should be wary as not to reach the point where this metric reaches the maximum queue size. When this happens, Filebeat will stop ingesting temporarily more events. Now we don’t want that. Implementing the following alert will make SREs and probably your manager very happy.

Kibana UI offers you the fastest and simplest way of creating this alert: I used monitoring.metrics.libbeat.pipeline.events.active field and paired it with elastic-agent within a Log Threshold aggregation. You can adjust your threshold as you see fit and according to the data stream and input you are receiving.

Voilà! Easy to set up and certainly helpful in times of need.

Before you go, I invite you to check out the new 8.11 Elasticsearch release page. Some naughty elves told me that they have a cool new Observability Rule type called Custom threshold under technical preview. Have a peak at all the goodies from the release list and enjoy a relaxing Christmas Day!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.