1. Redis Connections from beats agent 2. How beat behave when redis is full

All

Trying to understand how many connections beats would make to redis (To rightsize redis from connection perspective)

Setup: Azure Redis
Sending data using Filebeat and Metricbeat from Linux Systems. Filebeat is enabled with modules system and auditd. Metricbeat is sending below metric set with a period of 1m
metricsets:
- cpu
- filesystem
- memory
- network
- process

When I monitor Azure redis, it shows 5 clients. But when I look at redis "info clients" it shows connected_clients:2

Not sure which is true. I have not change any settings in beats yml perspective. No change to workers setting or any other setting.

On a different note, we are also facing problems when redis memory gets full. The beats keep dropping the messing OOM @ Redis, but it keeps occupying file-handles which runs out all file handles causing other process to fail. Though the original query on sizing and this is not directly related the e2e scenario comes into play when the processing agent out of redis fails, and redis gets full.

Thanks for the inputs.

Number of Beats->redis connections depends on your configs (output.redis.workers sets number of concurrent outputs). By default Beats maintain one connection to redis only.

If redis runs full, it can not accept any new messages. That is, it generates 'infinite' back-pressure until the situation is resolved. Each beat deals with this situation differently.

In Metricbeat the event will be eventually dropped after 3 attempts to send the data. For the time being Metricbeat will not collect data (e.g. if timeouts are bigger then collection intervals).

Filebeat has send-at-least once semantics. Which means, it does not drop events. The back-pressure will eventually affect the harvesters in filebeat (once the internal queue is full), basically stopping the harvesters. In the meantime filebeat still looks for new files. This can lead to filebeat eventually keeping open a many files if the situation persists for much too long or if filebeat can not catch up again.
Using the close_ settings (e.g. close_timeout one can try to force filebeat to close log files every now and then). By setting harvester_limit to a lower value you also restrict the number of concurrent files a prospector/input configuration can process. If files are rotated away before redis is up again you might loose complete log files though.

One can also configure multiple redis endpoints and load_balance: true in the redis output, so to scale horizontally. In this case filebeat will attempt to load-balance events among the known redis hosts. It is not round-robin load-balancing, but dynamically adjusts to available throughput. The redis host that is 'faster' will get more events. If one runs OOM, it has 0 available bandwidth. In this case filebeat will publish events only to the other redis hosts.

Another option to mitigate long redis downtimes is to make use of spooling to disk (still in beta). This provides a (limited in size) spool file. Events are put into the spool before forwarded to the output. Once redis is OOM or blocked for too long the spool will grow in size. Once the spool is full you will be back to the original behavior when redis was OOM, though.

As redis is supposed to be used as a queue in the context of Beats it's more of a sizing problem. The queue is supposed to buffer up events, e.g. during peak time. If you manage to fill up your queues, then either the queues or the consumers are not sized properly.

Thanks Stephen.

In our case it is Filebeat. But we don't have many files open. We generate new log files on a daily basis, and old one never gets updated. We also have ignore_older: 24h set. No other settings are changed from what would be default.

Whenever we see redis is full (for what ever reasons), we observe many file handles open, and blocking others. Will try to see what those handles represent. Moreover, once we clear redis, filebeat doesn't recover and we end-up restarting it.

Meanwhile, any guidance would be helpful.

Whenever we see redis is full (for what ever reasons), we observe many file handles open, and blocking others. Will try to see what those handles represent. Moreover, once we clear redis, filebeat doesn't recover and we end-up restarting it.

Yeah, would be interesting if these are actual logs. Filebeat log internal metrics every 30s. The 'harvester' metrics tell you how many harvesters collecting files are running. Each harvester equals one file.

Moreover, once we clear redis, filebeat doesn't recover and we end-up restarting it.

Any logs? Is filebeat still reporting errors? Is it waiting? Does the event rate increase in the metrics log?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.