I am in the process of trying to ingest a massive backlog of logs (10s of TB) via Filebeat > Logstash > Ingest Node > Elasticsearch.
I have scaled out the pipeline extensively and am now at a point where I am struggling to get more throughput out of the pipeline. The current throughput is VERY peaky, despite the fact the logs being available on disk, and there being very little network latency.
The peaky-ness appears to be between Filebeat and Logstash. Basically I am getting a rate of between 10 and 25 thousand events per second, yet neither component is close to saturated for memory, CPU or IO. My question is, is there a guide or rule of thumb about aligning the batch sizes, worker threads etc etc between Filebeat and Logstash to ensure that they work efficiently together? Is there a relationship between bulk_max_size with Filebeat and the pipeline.batch.size in Logstash? I would have thought it would have made sense to have them set to similar values so that 1 filebeat batch triggers 1 logstash batch execution?
I have spent several hours fiddling with the various bits and pieces and am not having a great deal of luck. I had my pipeline thoroughly saturated at 2 Logstash nodes, but have added 2 additional nodes and I am not getting the gains I would have hoped to get.
Thanks in advance.