Filebeat performing better when reading multiple files rather the a single file

We have a cluster configured with 5 logstash servers and 30 ES servers. A single host is exporting logs from a single file across the 5 logstash servers in a load balanced configuration using the filebeat logstash output. (load_balance: true, worker: 2)

When this was originally configured using ELK 6.2 we were seeing an ingest rate of about 96k events per second max. In recent weeks the volume of logs started to go up and our stack started to fall behind. We upgraded filebeat, logstash, es, and kibana to the latest 6.x version and noticed a drop in ingest rates to around 60k/second.

Now the odd thing is, because our stack can't keep up we've started seeing log rotation come into play. The filebeat service will fall far enough behind that it will often be reading from the current log file as well as 1 or 2 rotated log files before they are compressed and archived.

While reading from the one file we get the aforementioned 60k/sec ingest rate, but when the logs are rotated and filebeat is reading from 2 or more files at a time the ingest rate jumps up to 80k+/sec. All of the files are located on the same physical partition (AWS NVME) so I don't think it's an iOPS limit we're hitting.

I've tried playing with various settings including:

  • queue.mem.flush.min_events
  • queue.mem.flush.timeout
  • bulk_max_size
  • pipelining
  • compression
  • worker

But none of these seem to improve at all upon the 60k/sec ingest rate. Now we know the stack itself can handle a higher rate, as it floats along just fine when multiple files are being read.

Due to this problem we're looking at moving away from filebeat to a direct syslog->logstash flow. But in the meantime I'd really like to figure out a way to match the performance we see when reading from multiple files vs. one file. Any thoughts on why we'd see a 30% gain in performance when reading from more than one file?

Reading from a single file is by nature single-threaded, so I am not surprised to see you getting better throughput when reading multiple files, especially as dsik performance likely is not a limiting factor. How come you have that much logs coming into a single file? If you are consolidating logs, have you considered deploying Filebeat locally instead to spread the load?

You're probably right that no change in parameters will improve on the ingest rate considering that the file read is a single thread. I am still confused about why this changed so much between versions, but I guess I'll have to live without knowing that. We did not see any performance improvements in the past when the process was reading more than one file at a time. It's only after upgrading that we noticed that.

As to why we were stuck with using filebeat on a single host... the servers generating the logs do not store any logs. They log directly to syslog, which forwards them on to a central log server where they are written to disk and then read by filebeat. These source servers are pretty heavily loaded already so instead of migrating to filebeat on each server we're going to forward directly from syslog to logstash, bypassing the central log server. That way we don't add significant load to CPU or disk at the source.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.