FileBeat slow - Improve performance

I don't see how harvester limit should affect overall ingestion rates. Harvester limit controls how many files you process in parallel.

When trying to tune ingestion, try to identify the bottleneck first. There are a few components that might create back-pressure. Filebeat is rarely the bottleneck.

I'd start testing how fast filebeat can actually process files on localhost. Have a separate filebeat test config, with separate registry file. Delete the registry between runs. Using pv, we can check filebeat throughput when using the console output:

$ ./filebeat -c test-config.yml | pv -Warl >/dev/null

in test-config.yml we configure the console output:

output.console:
  pretty: false

The pv-tool will print the throughput in number of lines == events per second to stderr.

Next try with beats + logstash.

simple logstash config for testing:

input {
  beat { ... }
}

output {
  stdout { codec => dots }
}

This config prints one dot per event. Using pv we can measure throughput:

./bin/logstash test.cfg | pv -War > /dev/null

You can measure the impact of filters in LS, by adding the filters to your test.cfg.

If event output rate in LS is large enough, you have to continue tuning LS->ES (or just ES).

In filebeat the default queue size is 4096 events and the bulk_max_size is 2048. Due to async publishing, you will have at most 4096 events in flight. If sending from one beat only, I don't think pipeline.batch.size: 8192 will be effective.

In filebeat there is a memory queue (See: queue docs), which collects the events from all harvesters. The queue is used to combine events into batches. The output draws a batch of bulk_max_size from the queue. The memory queue size setting is queue.mem.events. That is, in filebeat with filled up queues (which is quite normal due to back-pressure), you will have B = queue.mem.events / output.logstash.bulk_max_size batches. By default 2. The logstash output by default operates asynchronously, with pipelining: 2. That is, one worker can have up to 2 life batches. The queue will block and accept new events only after Logstash did ACK a batch. The total number of output workers in filebeat is given by (assuming loadbalance: true): W = output.logstash.worker * len(hosts). The total of batches the set of workers can process is given by A = W *output.logstash.pipelining. In order to keep network/outputs busy, we want queue.mem.events > A * output.logstash.bulk_max_size, with queue.mem.events being a multiple of bulk_max_size.

Setting worker > 1 is similar to configuring the same IP N times in the hosts setting. Assuming you 2 Logstash endpoints and pipelining: 2 (the default), and worker: 4 we have: W = 8 and A = 16 -> queue.mem.events > 32768. E.g. we could double the number, so to guarantee we have a batch prepare the moment we receive an ACK from logstash -> queue.mem.events: 65536.

If we tune the logstash output in filebeat in any ways without seeing reall improvements, the we did try to tune the wrong sub-system. Always measure first to identify bottlenecks.

Regarding Logstash tuning also check out: https://www.elastic.co/guide/en/logstash/current/tuning-logstash.html#tuning-logstash

The pipeline.batch.size configures the batch size forwarded to one worker. Having 8 workers, a queue size of 8192, but filebeat just publishing 4096 events max won't give you much of an improvement. A batch of 4096 events likely will be forwarded to one worker only (after some milliseconds delay controlled by pipeline.batch.delay). Bigger batches and number of workers will be more helpful if you have multiple filebeat instances.

7 Likes