We are trying to use Filebeat in front of our ELK stack, feeding it logs from network sensors.
We have some particularly 'talky' logs coming out of a system. One of the logs can generate 15k lines per second (eps), and has gotten up to 40k lines per second (eps). Others on that system peak out at 5k lines per second.
One of the tricks is, these get rolled over every hour. The old files get pulled and put into another directory (and gzipped), and new ones are generated (with the same name). So files with the same name appear every hour, although they will have different file descriptors.
I have all of this forwarding out via a single filebeat instance, where I have different prospectors defined in the filebeat.yml.
It seems to keep up for a few hours, and then tends to decline. There is a surge at the beginning of the hour, and then things tailor off. I have the @timestamp correlated to a timestamp in each line in the log (being overwritten in logstash), so Kibana is reporting timestamps of the data, and not the "insert time" (which I catch in another field).
Any clue on the upper bounds of filebeat, or some ways around this? Anything I should be looking for? We are probably looking at 75k-80k eps coming out of a single box.
Do you send from filebeat to logstash or directly to elasticsearch? How many workers have you configured for output? Is loadbalancing enabled? Can you share your filebeat configs?
There are a many factors regarding filebeat performance. Just sending files to /dev/null on physical machine I was able to process like 95k eps. filebeat throughput depends on disk IO (unless files still buffered by OS caches) and downstream performance. E.g. if sending directly to elasticsearch indexing performance in elasticsearch. If sending to logstash throughput depends on processing time within logstash + performance even more downstream. This is due the outputs generating back-pressure if they can not keep up slowing down event generation in filebeat (as we don't want to drop any events).
Yes. Filebeat is sending to Logstash. I have 8 or so Logstash nodes, each with 20GB dedicated to it on beefy boxes, and I upped the worker count. Filebeat is load balanced between those.
Logstash the sends downstream to 8 ES nodes (3 master & 8 data in the cluster).
How can I find out if it's Logstash that's causing the problem? I don't see anything in the logs that says there's a problem regarding Logstash or Elastic.
So to bring this thread back up, I can say that we are again seeing the same limitations. Around 8k eps seems to be the best we can get out of filebeat.
Now, I know others have said they've benchmarked it at higher, but I feel like a lot of these are disingenuous, as they are often framed in a non-real-world context. Piping data out to /dev/null, and not talking about what kind of data you're dealing with, isn't really helpful.
I'd like to see example data, and know how it looks when getting wrapped in TCP and dealing with network latency.
Considering this is the first thread that comes up in a Google search about the subject, it would be good to get some good data.
The limitation isn't actually with filebeat but with the logstash output plugin inside filebeat.
When doing Filebeat -> Logstash (To a single instance or to a round-robin set of Logstash instances) the throughput tops out at 8k eps. This seems to be bound to the output logstash plugin in filebeat as the receiving Logstash can process well over 8keps from multiple sources, but not from multiple files on the same origin host.
Meanwhile if we use the redis output plugin in filebeat, we can send filebeat -> redis <- logstash -> ES at around 40k eps from a single source. =
For information, I was able to run FileBeat => Logstash at 18 k (eps) in following context
Figures
2000000 lines of log transfered (end to end) in 111 sec (generated by process in 25 sec (230 MB of original data)... This represent 2MB/sec (origin to target)
Lines of log contains 110 char each
Only 1 log file in origin
FileBeat running on pod (minikube on windows host virtual box mode) redirecting to "external logstash service"
LogStash running on Windows host and writting to SSD
Note: ramping up bulk_max_size and enabling pipelining should not really make a difference, as filebeat.spooler_size sets the maximum batch size pushed to the output. Instead consider splitting up a batch into multiple N sub-batches: N = spooler_size / bulk_max_size. Then pipelining can reduce some encoding/waiting latencies (for one worker it only affects slow-start windowing). Also increase number of workers. Currently filebeat only proceeds if all events in a spooler batch are ACKed. That is, having multiple workers and sub-batches, you will get some lock-step-load-balancing. The bigger N, the more batches can be load-balanced/pipelined (at the cost of increased memory usage). Having pipelining + workers I'd put N >= pipelining * worker. I don't think there is much of a difference between pipelining:5 and pipelining:10.
(Sorry, the content of my comment has changed. Indeed expliciting compresssion level does not alter the perf as it)
Trying to play with compression level and event size
Without compression
- Very Small event (15char) : 0.7 MB/s 14 k evt/s
- Huge event (10000char) : 13 MB/s 1.4 k evt/s
With compression 3
- Very Small event (15char) : 0.6 MB/s 12 k evt/s
- Huge event (10000char) : 27 MB/s 2.9 k evt/s
The evt/s seems to vary a lot depending on the event lenght in byte (normal)
Network throughput limit seems to be reached on huge event (seeing a degradation when not compressing)
Checking the code, the default is indeed 3. From my experience disabling compression can improve performance (given enough network bandwidth) due to reduced latencies.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.