Throughput and backpressure metrics

(Jeffrey M. Hardy) #1


I am on elk stack 6.2.4 (and previously 5.6.2), filebeating logs off of a handful of syslog aggregators, trying to achieve a relatively modest throughput of ~3000 lines/s, 500kiB/s (measurements with pv). In addition to looking at the file offset, I track "receive_lag" on each event by adding the following with logstash:

ruby {
code => "event.set('receive_lag', - event.get('@timestamp').to_f)"

In short, I can achieve a logging rate roughly 250% of what is needed on a single logstash node, using the json_lines coded to fileout. That increases to 400-500% when load-balancing to two logstash nodes. In order to see this, I delete the filebeat registry to let filebeat "catch-up" as fast as it presumably can, and measure with pv, tracking the reported offset, watching receive_lag, and additionally the logstash metrics filter.

When I push these same sources to a three-node elasticsearch cluster, I am not able to keep up with the logging rate. Interestingly, it's only the busiest log source that falls behind (sometimes also the second busiest), the first accounting for roughly 1/2 of the logging rate on its own. Meanwhile events from other logs that come in at a relative trickle keep up just fine. I am assuming this is probably the result of backpressure prioritizing the busiest sources?

How can I see information about backpressure? Secondly, might breaking up the log sources so no individual log source is so "heavy" help?

Thanks in advance,

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.