Speed limitations of filebeat?


(Jason) #1

We are trying to use Filebeat in front of our ELK stack, feeding it logs from network sensors.

We have some particularly 'talky' logs coming out of a system. One of the logs can generate 15k lines per second (eps), and has gotten up to 40k lines per second (eps). Others on that system peak out at 5k lines per second.

One of the tricks is, these get rolled over every hour. The old files get pulled and put into another directory (and gzipped), and new ones are generated (with the same name). So files with the same name appear every hour, although they will have different file descriptors.

I have all of this forwarding out via a single filebeat instance, where I have different prospectors defined in the filebeat.yml.

It seems to keep up for a few hours, and then tends to decline. There is a surge at the beginning of the hour, and then things tailor off. I have the @timestamp correlated to a timestamp in each line in the log (being overwritten in logstash), so Kibana is reporting timestamps of the data, and not the "insert time" (which I catch in another field).

Any clue on the upper bounds of filebeat, or some ways around this? Anything I should be looking for? We are probably looking at 75k-80k eps coming out of a single box.


(Steffen Siering) #2

Do you send from filebeat to logstash or directly to elasticsearch? How many workers have you configured for output? Is loadbalancing enabled? Can you share your filebeat configs?

There are a many factors regarding filebeat performance. Just sending files to /dev/null on physical machine I was able to process like 95k eps. filebeat throughput depends on disk IO (unless files still buffered by OS caches) and downstream performance. E.g. if sending directly to elasticsearch indexing performance in elasticsearch. If sending to logstash throughput depends on processing time within logstash + performance even more downstream. This is due the outputs generating back-pressure if they can not keep up slowing down event generation in filebeat (as we don't want to drop any events).

somewhat related:

There's even a community beat collecting stats like in python script for storing in ES for example: https://github.com/urso/govarbeat


(Jason) #3

Yes. Filebeat is sending to Logstash. I have 8 or so Logstash nodes, each with 20GB dedicated to it on beefy boxes, and I upped the worker count. Filebeat is load balanced between those.

Logstash the sends downstream to 8 ES nodes (3 master & 8 data in the cluster).

How can I find out if it's Logstash that's causing the problem? I don't see anything in the logs that says there's a problem regarding Logstash or Elastic.


(Steffen Siering) #4

can you share your filebeat config?

See related posts giving some tips to debug throughput.


(Jason) #5
filebeat:
prospectors:
    -
    - paths:
        - /path/to/conn.log
      fields: {log_type: conn}
      document_type: conn
      scan_frequency: 500ms
    - paths:
        - /path/to/capture_loss.log
      fields: {log_type: capture_loss}
      document_type: capture_loss
      scan_frequency: 5s
    - paths:
        - /path/to/communication.log
      fields: {log_type: communication}
      document_type: communication
      scan_frequency: 5s
    - paths:
        - /path/to/dhcp.log
      fields: {log_type: dhcp}
      document_type: dhcp
      scan_frequency: 3s
    - paths:
        - /path/to/dns.log
      fields: {log_type: dns}
      document_type: dns
      scan_frequency: 1s
 ########
 # There are around 25 more log files listed, just like the ones above.  Most are at 3s frequency.
 ########
    input_type: log
    max_bytes: 1557640
    max_lines: 60000
    spool_size: 16384
    idle_timeout: 2s
   
output:
    logstash:
    hosts: ["10.0.0.1:5044", "10.0.0.2:5044", "10.0.0.3:5044", "10.0.0.4:5044", "10.0.0.5:5044", "10.0.0.6:5044", "10.0.0.7:5044", "10.0.0.8:5044", "10.0.0.9:5044", "10.0.0.10:5044", "10.0.0.11:5044"]
    loadbalance: true
    index: logstash-%{+YYYY.MM.dd-HH}

(Steffen Siering) #6

have you tried to measure throughput in your processing chain as described here: Filebeat sending data to Logstash seems too slow ?


(Ravi Shanker Reddy) #7

Yes I tried those.

While filebeat->logstash
168kiB 0:02:01 [1.63kiB/s] [ 1.4kiB/s]
While logstash reading the file directly
458kiB 0:02:03 [3.72kiB/s] [3.73kiB/s]

My filebeat config

filebeat: prospectors: - paths: - /home/sms/SMSC-RS-2.0.7.0/logs/*.log input_type: log ignore_older: 30m scan_frequency: 1s output: logstash: worker: 4 hosts: ["172.16.22.12:5044"] bulk_max_size: 3000

Remaining left to default. How to speed my my filebeats now???

Filebeat version filebeat-1.2.3-x86_64
Logstash veriosn logstash-2.4.0


(ruflin) #8

As a first step I would recommend you to update to the most recent filebeat release: 5.0.0.


(Jason) #9

So to bring this thread back up, I can say that we are again seeing the same limitations. Around 8k eps seems to be the best we can get out of filebeat.

Now, I know others have said they've benchmarked it at higher, but I feel like a lot of these are disingenuous, as they are often framed in a non-real-world context. Piping data out to /dev/null, and not talking about what kind of data you're dealing with, isn't really helpful.

I'd like to see example data, and know how it looks when getting wrapped in TCP and dealing with network latency.

Considering this is the first thread that comes up in a Google search about the subject, it would be good to get some good data.


#10

The limitation isn't actually with filebeat but with the logstash output plugin inside filebeat.

When doing Filebeat -> Logstash (To a single instance or to a round-robin set of Logstash instances) the throughput tops out at 8k eps. This seems to be bound to the output logstash plugin in filebeat as the receiving Logstash can process well over 8keps from multiple sources, but not from multiple files on the same origin host.

Meanwhile if we use the redis output plugin in filebeat, we can send filebeat -> redis <- logstash -> ES at around 40k eps from a single source. =


#11

Hello

For information, I was able to run FileBeat => Logstash at 18 k (eps) in following context

Figures

  • 2000000 lines of log transfered (end to end) in 111 sec (generated by process in 25 sec (230 MB of original data)... This represent 2MB/sec (origin to target)
  • Lines of log contains 110 char each
  • Only 1 log file in origin
  • FileBeat running on pod (minikube on windows host virtual box mode) redirecting to "external logstash service"
  • LogStash running on Windows host and writting to SSD

filebeat conf: (using 5.3 )

filebeat.modules:
filebeat.prospectors:
- input_type: log
  paths:
    - /log/*.log
output.logstash:
  hosts: ["logstash:5044"]

(I've tried to play a bit using bulk_max_size : 8192 and pipelining : 10 .. but without obvious performance change gain of 3 - 5% )

logstash conf (miror) :

input {
  beats {
    port => 5044
  }
}
filter {
  mutate {
    gsub => [ "source",".:",""]
  }
}
output {
 file {
   path => "/LOG/%{host}/%{source}"
   codec => line { format => "%{message}"}
 }
}

(Steffen Siering) #12

Note: ramping up bulk_max_size and enabling pipelining should not really make a difference, as filebeat.spooler_size sets the maximum batch size pushed to the output. Instead consider splitting up a batch into multiple N sub-batches: N = spooler_size / bulk_max_size. Then pipelining can reduce some encoding/waiting latencies (for one worker it only affects slow-start windowing). Also increase number of workers. Currently filebeat only proceeds if all events in a spooler batch are ACKed. That is, having multiple workers and sub-batches, you will get some lock-step-load-balancing. The bigger N, the more batches can be load-balanced/pipelined (at the cost of increased memory usage). Having pipelining + workers I'd put N >= pipelining * worker. I don't think there is much of a difference between pipelining:5 and pipelining:10.


#13

(Sorry, the content of my comment has changed. Indeed expliciting compresssion level does not alter the perf as it)

Trying to play with compression level and event size
Without compression
- Very Small event (15char) : 0.7 MB/s 14 k evt/s
- Huge event (10000char) : 13 MB/s 1.4 k evt/s
With compression 3
- Very Small event (15char) : 0.6 MB/s 12 k evt/s
- Huge event (10000char) : 27 MB/s 2.9 k evt/s

The evt/s seems to vary a lot depending on the event lenght in byte (normal)
Network throughput limit seems to be reached on huge event (seeing a degradation when not compressing)


(Steffen Siering) #14

Checking the code, the default is indeed 3. From my experience disabling compression can improve performance (given enough network bandwidth) due to reduced latencies.


(system) #15