Hello,
As of 2.2, pipeline architecture has changed.
From the docs:
Each input {} statement in the Logstash configuration file runs in its own thread. Inputs write events to a common Java SynchronousQueue. This queue holds no events, instead transferring each pushed event to a free worker, blocking if all workers are busy. Each pipeline worker thread takes a batch of events off this queue, creating a buffer per worker, runs the batch of events through the configured filters, then runs the filtered events through any outputs.
IIRC, giving that:
- Redis input thread(s) are sleeping when there's no free worker.
- When a pipeline worker is free, he asks to a free Redis input thread n events, where n is the batch size (-b switch)
With this in mind, if I have Logstash running on say 16 core machine:
- with a default batch size of 125
- with a default of 16 workers
Is it a good idea to configure the redis input with:
- The redis batch_count value the same value of pipeline batch size?
- The redis batch_count value an inferior multiple of pipeline batch size (in our case, say 25)?
- With a lot of workers, a certain amount of redis-input threads? (in our case, say 4)?
- If I have only 1 input thread for 16 or 32 pipeline workers, is it bad?
I know that's a lot of testing and it depends on our specific case, but I want to find the right balance.
A kind of rule of thumb like:
- number of workers = cores * 2
- a starting default of 125 for pipeline batch size
- redis-input batch-count at the same value of pipeline batch size (125)
- depending on filter complexity and processing time, a certain amount of redis-input threads like #workers / 4, or 8 depending the case.
The point is to make a certain amount of workers not wait for free input threads.
Am I missing something?
Thanks
Bruno Lavoie