Input-redis parameters versus pipeline parameters


As of 2.2, pipeline architecture has changed.

From the docs:

Each input {} statement in the Logstash configuration file runs in its own thread. Inputs write events to a common Java SynchronousQueue. This queue holds no events, instead transferring each pushed event to a free worker, blocking if all workers are busy. Each pipeline worker thread takes a batch of events off this queue, creating a buffer per worker, runs the batch of events through the configured filters, then runs the filtered events through any outputs.

IIRC, giving that:

  • Redis input thread(s) are sleeping when there's no free worker.
  • When a pipeline worker is free, he asks to a free Redis input thread n events, where n is the batch size (-b switch)

With this in mind, if I have Logstash running on say 16 core machine:

  • with a default batch size of 125
  • with a default of 16 workers

Is it a good idea to configure the redis input with:

  • The redis batch_count value the same value of pipeline batch size?
  • The redis batch_count value an inferior multiple of pipeline batch size (in our case, say 25)?
  • With a lot of workers, a certain amount of redis-input threads? (in our case, say 4)?
  • If I have only 1 input thread for 16 or 32 pipeline workers, is it bad?

I know that's a lot of testing and it depends on our specific case, but I want to find the right balance.

A kind of rule of thumb like:

  • number of workers = cores * 2
  • a starting default of 125 for pipeline batch size
  • redis-input batch-count at the same value of pipeline batch size (125)
  • depending on filter complexity and processing time, a certain amount of redis-input threads like #workers / 4, or 8 depending the case.

The point is to make a certain amount of workers not wait for free input threads.

Am I missing something?

Bruno Lavoie