We have a fairly active ELK stack set up with the following configuration:
- Local pushers running on the application servers with:
- file input using multiline codec over multiple logfiles
- minimal filter adding host-specific fields to the message
- trim filter truncating any message larger than X bytes
- s3 output
- Indexers in AWS doing the heavy grokking with s3 input and es output
Several servers push around 50,000 lines per minute, but our "batch processing servers" can generate up to 250,000 lines per minute.
The trim filter is needed to prevent overly large log lines from causing S3 upload failures.
When we tried to upgrade these pushers to 5.0 (using the same config) throughput on the application servers dropped to around 4k/minute and throughput on the batchservers dropped to around 13k/minute. In both cases around a factor of 20 difference from what we were getting before.
We tried a few general modifications, but were unable to get the performance we need and had to roll back to 2.2.4.
Some things we tried:
- Giving every logfile on a server it's own input section (https://www.elastic.co/guide/en/logstash/current/execution-model.html suggested each input section gets it's own thread).
- Switching the trim filter from a ruby code section to a regexp grok mutate (the old code ran into some deprecation issues)
Has anyone else had any similar experiences with 5.0 pusher performance, or, even better, managed to get their pushers pushing faster?
Does anyone have any suggestions for other tuning we could try or infrastructure changes we could make?
I'd like to move forward with the new logstash if possible...