Hi everyone. I am using Logstash Version: 9.1.0
I was debugging my output plugin and I found out that when using the default In-memory queue, Logstash would batch the events into smaller chunks of events.
I have script to write 100 records per second to a file I use as input for Logstash.
The output plugin receives the batched events and prints the number of events per batch.
Now here is what I found out while testing the different parameters in logstash.yml:
Default - In-memory Queue:
With 100 logs per second written to file and default logstash.yml configuration (no modifications), the batch sizes look like this: 20, 33, 27, 20 - total: 100 events.
Using Persistent Queue:
100 logs per second written to file (same as before), and the only change in logstash.yml is setting queue.type: persisted. The 100 records were batched in a single batch with a size of 100 events.
The goal:
I would like to know if there is a way to have the In-memory Queue batch events like Persistent Queue does? What I want to achieve is to have less small requests, and instead have a single request or batch with all those events (until they fill pipeline.batch.size: 100 for example).
What I have tried:
I already tried playing with these pipeline parameters in Logstash.yml:
pipeline.batch.size
pipeline.batch.delay
I’ve read that those two are the only parameters that modify the batch behavior, and pipeline.batch.size only sets the max number of events per batch, but there might be something I’m missing.
Here is my logstash .conf file in case is useful. my_plugin only uploads the events to an endpoint and prints to stdout the number of events in each batch.
Have you tried to add pipeline.workers: 1 in logstash.yml or to set pipeline.batch.size = 100 and increase to pipeline.batch.delay = 250?
Don't forget to restart LS.
Is the 100 e/s the real number you will work or is this just for test? Because it may be too small to troubleshoot, it is even smaller than the default batch size of Logstash, which is 125.
Keep in mind that pipeline.batch.size is applied per worker, so if you are running logstash on a server with more than one CPU, then it us using more than one worker.
The in-memory queue and the persistent queue works in different ways and have different goals, but in your example by changing to the persisted queue the first change is that you made your pipeline a little slower as everything needs to be written to disk and then read from disk again before being processed.
With a file input you are reading events, writing into the page file and then reading it again from the page file before passing through the filters and outputs.
This extra time may have been enough for the events to be grouped on a single batch.
What changes you made to those settings? As you discovered pipeline.batch.size only sets the maximum size of the batch, I don't think there is anything else missing.
First, thanks everyone for your replies, they were very helpful.
Is the 100 e/s the real number you will work or is this just for test? Because it may be too small to troubleshoot, it is even smaller than the default batch size of Logstash, which is 125.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.