So I have gone through the majority of this forum and what I can find on the web, and have come up short of how exactly to push logstash to get to this kind of scale.
A bit of back story, I am currently running a 5 client instance, 5 master instance, and 10 data instance elasticsearch cluster totalling ~110TB of disk space and 320GB of ram dedicated to heap and an equivalent amount dedicated to OS and filesystem caching, while each host is housing a 20 GB/s networking setup. This cluster is absolutely crushing anything we throw at it.
On the other hand we have 3 instances of logstash running along with 3 instances of redis running sitting right in front of the logstash instances (collocated on the same server). These instances are hosted by 3 separate servers with 48 cores and 256GB of RAM and beefy NVMe drives backing redis, along with 20 GB/s network setups. However each instance seems to max out at ~25k eps. Here is a sample configuration: https://gist.github.com/Supernomad/b6eb2fe7eb8e05365630055d7b8400c6
The shipper in use is filebeat and that piece is working as expected with no issues. Currently the topology is as follows:
filebeat -> redis (all 3 instances are listed and loadbalance == true) -> logstash (each logstash points to the redis collocated with it) -> elasticsearch (all 5 client instances are listed)
I have tried all manner of combinations of
-b from the defaults up to the point where
-w 96 and
-b 10000 this didn't seem to help past a certain point, but did for
-w < core count and
-b < 2048. I have also tried messing with the input redis thread and batch counts to no avail either. Considering the size and sheer power available to the logstash instances and the relatively simple configuration in use, with 0 filtering of the actual incoming data as its all json objects in the correct form, I am expecting far better throughput from logstash.
In order to confirm that the issue does indeed lie with logstash I have pointed these messages directly to elasticsearch, bypassing both redis + logstash, and elasticsearch ingests the data just fine, with minimal load I might add. The reason I can't stay with that setup, pointing directly to elasticsearch, is I will be adding new data that I must filter using various filter plugins including
I have also confirmed that the data is getting to redis just fine, and just spools inside of the redis instances.
The only thing I have read is that it might be best to scale the number of logstash instances and have a lot of small instances vs a few large instances, but this is an operationally heavy proposition. Especially considering at the eps rate that a large instance is ingesting I would need a massive number of small instances to achieve my goals.
Am I misunderstanding the issue, and/or just missing some piece of configuration that could make logstash faster? Or is the only way forward horizontal scale in a massive way?