Logstash "input" performance?

MarcelHallmann · October 8, 2015, 9:16am

Hi,

we have one logstash instance that is monitoring several application log folders (about 10 folders).
The configuration looks like:

 input {
     file {
         codec => json {
             charset => "UTF-8"
         }
         path =>  ["/log/logfileA.log"]
         sincedb_path => "/var/logfileA.sincedb"
         start_position => beginning
     }

     file {
         codec => json {
             charset => "UTF-8"
         }
         path =>  ["/log/logfileB.log"]
         sincedb_path => "/var/logfileB.sincedb"
         start_position => beginning
     }
 ....

 }

Now we made a loadtest on the applications with millions of log entries and I could see that the logstash input seems to get slow over the time.
(the generated log files have a size of several hundreds MB)

At first all logfiles are read in fast but after a few minutes it seems to stuck on certain logfiles.
In Kibana we can see that the number of data from logfileA is still increasing, but not from logfileB. After restarting the logstash process it is fast at the beginning too, but then the same behavior occurs.

For me it seems that the input thread doesn't poll the log folders in a round robin manor.

What are your ideas to handle this?
Should we have one logstash instance for e.g. 2-3 log folders?
Or are there any other ways how can we influence the behavior?

I also played with the -w flag but that didn't solve the problem.

warkolm · October 8, 2015, 9:55pm

Each of the inputs gets their own thread.
What's the general load on the system like, how much heap did you give it, what are the outputs, how many workers did you try?

MarcelHallmann · October 9, 2015, 6:37am

Thanks for your reply.

The general system load is ok, I think. CPU is used of course, memory is used to about 15%.
I started the logstash process with -Xmx500m - is this ok?
The LS_HEAP_SIZE isn't set explicitly - what is a good value?
The output is elasticsearch (I'm sure this is not the problem)
I tried with 1 filterworker and with 4, but there was absolutely no difference.

I also tried another setup to workaround the issue:
1 logstash process for the largest log file, and another logstash process for all other log folders.
That did solve (workaround) the problem, but it would be much nicer to have only one logstash process.

[edit] Forgot to say that we are using logstash 1.4.2

warkolm · October 9, 2015, 6:47am

I'd upgrade to 1.5.2, there are some very good improvements there.

Also have you tried testing the inputs with no filters and an output to /dev/null (or whatever) to make sure that this is not elsewhere in the message flow?

MarcelHallmann · October 9, 2015, 7:03am

At the moment we can't upgrade to the 1.5.x version due to this bug: https://github.com/elastic/logstash/issues/3641

What values for the Xmx and LS_HEAP would you recommend?

warkolm · October 9, 2015, 7:20am

You shouldn't really need a lot for heap, a gig at most.

alexolivan · October 9, 2015, 8:00am

Hi...

We are experiencing the same feeling.... with time, it seems like logstash dowesn't work at all...
I'm on latest 1.4.5.

I'm affraid to upgrade to logstash 1.5.2 because of incompatibilty between versions... this is why I stick to 1.4

Is there any tweek on logstash lo increase the threads, a thread limit, or so...

...I had not thought on the posibility of launching paralel logstash instances!.... cool!

best regards....

warkolm · October 9, 2015, 9:24pm

1.5.2 is latest.
But it might be better to raise a new thread on what your incompatibility concerns are.

alexolivan · October 12, 2015, 10:48pm

I agree...

Best regards.

Topic		Replies	Views
Logstash file input performance Logstash	5	608	July 15, 2021
File input, from NetApp CIFS share, not reading single file Logstash	11	1329	September 16, 2019
File Input from directory with 100K files Logstash	5	281	December 2, 2020
Logstash file input very slow with lots of files Logstash	1	593	June 17, 2020
Logstash Huge data import + Fast import Logstash	7	2880	July 6, 2017

Logstash "input" performance?

Related topics