Hello,
I am on elk stack 6.2.4 (and previously 5.6.2), filebeating logs off of a handful of syslog aggregators, trying to achieve a relatively modest throughput of ~3000 lines/s, 500kiB/s (measurements with pv). In addition to looking at the file offset, I track "receive_lag" on each event by adding the following with logstash:
ruby {
code => "event.set('receive_lag', Time.now.to_f - event.get('@timestamp').to_f)"
}
In short, I can achieve a logging rate roughly 250% of what is needed on a single logstash node, using the json_lines coded to fileout. That increases to 400-500% when load-balancing to two logstash nodes. In order to see this, I delete the filebeat registry to let filebeat "catch-up" as fast as it presumably can, and measure with pv, tracking the reported offset, watching receive_lag, and additionally the logstash metrics filter.
When I push these same sources to a three-node elasticsearch cluster, I am not able to keep up with the logging rate. Interestingly, it's only the busiest log source that falls behind (sometimes also the second busiest), the first accounting for roughly 1/2 of the logging rate on its own. Meanwhile events from other logs that come in at a relative trickle keep up just fine. I am assuming this is probably the result of backpressure prioritizing the busiest sources?
How can I see information about backpressure? Secondly, might breaking up the log sources so no individual log source is so "heavy" help?
Thanks in advance,
-Jeff