Logstash file input performance

seadub · June 15, 2021, 7:35pm

Continuing the discussion from Tuning to handle extreme initial ingestion conditions (with logstash):

@Christian_Dahlqvist - I know I'm resurrecting an old thread here but I'd love to confirm a point you'd raised. I hope I can get input from anyone with knowledge on this issue.

For context, I'm currently supporting a deployment of 6.8.x infrastructure. I have a setup with 8 cores and 24gb of memory which has been struggling to keep up with the incoming volume of data. CPU and memory usage is acceptable, nothing overloaded on that front. The single instance has a pipeline which will take incoming beats events and dump them to files ( for backups ), and process those files from the hosts' file system using various input-file -> elasticsearch pipelines after that.

If each pipeline using the logstash-input-file plugin is configured to have multiple worker threads, will logstash spawn a separate thread each with its own instance of the plugin, each processing single-threaded?

Thanks.

Badger · June 15, 2021, 7:51pm

For the filter plugins, yes.

Note that the thread you linked to was a problem that should have been resolved by this commit.

seadub · June 15, 2021, 8:55pm

Hi Badger. Thanks for the quick reply.

For the filter plugins, yes.

But not the actual input plugins?

Perhaps a related question then, with more helpful data.

From the Execution Model docs:

Each input stage in the Logstash pipeline runs in its own thread. Inputs write events to a central queue that is either in memory (default) or on disk.

If we have 22 pipelines, 21 of which are operating on the input-file plugin, will the overarching logstash process have 21 single-threaded instances of the input-file plugin running?

Badger · June 15, 2021, 9:32pm

If you have 21 file inputs configured there will be 21 single threaded instances running. You can check by getting a thread dump. In a logstash instance with five workers and three file inputs the dump will include

"[main]>worker0" #34 daemon prio=5 os_prio=0 cpu=13.36ms elapsed=2.62s tid=0x00007f0314021000 nid=0xe61 waiting on condition  [0x00007f030828e000]
"[main]>worker1" #35 daemon prio=5 os_prio=0 cpu=4.86ms elapsed=2.61s tid=0x00007f031401f800 nid=0xe62 waiting on condition  [0x00007f0308490000]
"[main]>worker2" #36 daemon prio=5 os_prio=0 cpu=6.08ms elapsed=2.61s tid=0x00007f031402d800 nid=0xe63 waiting on condition  [0x00007f0303ffd000]
"[main]>worker3" #37 daemon prio=5 os_prio=0 cpu=19.30ms elapsed=2.61s tid=0x00007f0314027000 nid=0xe64 waiting on condition  [0x00007f030838f000]
"[main]>worker4" #38 daemon prio=5 os_prio=0 cpu=166.16ms elapsed=2.60s tid=0x00007f0314029000 nid=0xe65 waiting on condition  [0x00007f030818d000]
"[main]<file" #39 daemon prio=5 os_prio=0 cpu=38.83ms elapsed=2.17s tid=0x00007f0314064800 nid=0xe66 waiting on condition  [0x00007f03033f7000]
"[main]<file" #40 daemon prio=5 os_prio=0 cpu=254.40ms elapsed=2.17s tid=0x00007f0314050000 nid=0xe67 waiting on condition  [0x00007f03032f6000]
"[main]<file" #41 daemon prio=5 os_prio=0 cpu=36.31ms elapsed=2.16s tid=0x00007f0314070800 nid=0xe68 waiting on condition  [0x00007f03031f5000]

seadub · June 17, 2021, 2:03pm

Thanks, that does match what I'm seeing. I appreciate your help!

system · July 15, 2021, 2:04pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash multithreading? Logstash	2	2888	June 28, 2019
Maximising Logstash CPU Utilisation Logstash	2	434	March 11, 2019
How does logstash use pipeline.batch.size to execute pipeline? Logstash	8	1054	February 7, 2023
Logstash "input" performance? Logstash	9	1809	July 6, 2017
File not loaded entirely using stdin plugin Logstash	3	625	October 13, 2018

Logstash file input performance

Related topics