Logstash "input" performance?

Hi,

we have one logstash instance that is monitoring several application log folders (about 10 folders).
The configuration looks like:

 input {
     file {
         codec => json {
             charset => "UTF-8"
         }
         path =>  ["/log/logfileA.log"]
         sincedb_path => "/var/logfileA.sincedb"
         start_position => beginning
     }

     file {
         codec => json {
             charset => "UTF-8"
         }
         path =>  ["/log/logfileB.log"]
         sincedb_path => "/var/logfileB.sincedb"
         start_position => beginning
     }
 ....

 }

Now we made a loadtest on the applications with millions of log entries and I could see that the logstash input seems to get slow over the time.
(the generated log files have a size of several hundreds MB)

At first all logfiles are read in fast but after a few minutes it seems to stuck on certain logfiles.
In Kibana we can see that the number of data from logfileA is still increasing, but not from logfileB. After restarting the logstash process it is fast at the beginning too, but then the same behavior occurs.

For me it seems that the input thread doesn't poll the log folders in a round robin manor.

What are your ideas to handle this?
Should we have one logstash instance for e.g. 2-3 log folders?
Or are there any other ways how can we influence the behavior?

I also played with the -w flag but that didn't solve the problem.

Each of the inputs gets their own thread.
What's the general load on the system like, how much heap did you give it, what are the outputs, how many workers did you try?

Thanks for your reply.

The general system load is ok, I think. CPU is used of course, memory is used to about 15%.
I started the logstash process with -Xmx500m - is this ok?
The LS_HEAP_SIZE isn't set explicitly - what is a good value?
The output is elasticsearch (I'm sure this is not the problem)
I tried with 1 filterworker and with 4, but there was absolutely no difference.

I also tried another setup to workaround the issue:
1 logstash process for the largest log file, and another logstash process for all other log folders.
That did solve (workaround) the problem, but it would be much nicer to have only one logstash process.

[edit] Forgot to say that we are using logstash 1.4.2

I'd upgrade to 1.5.2, there are some very good improvements there.

Also have you tried testing the inputs with no filters and an output to /dev/null (or whatever) to make sure that this is not elsewhere in the message flow?

At the moment we can't upgrade to the 1.5.x version due to this bug: https://github.com/elastic/logstash/issues/3641

What values for the Xmx and LS_HEAP would you recommend?

You shouldn't really need a lot for heap, a gig at most.

Hi...

We are experiencing the same feeling.... with time, it seems like logstash dowesn't work at all...
I'm on latest 1.4.5.

I'm affraid to upgrade to logstash 1.5.2 because of incompatibilty between versions... this is why I stick to 1.4

Is there any tweek on logstash lo increase the threads, a thread limit, or so...

...I had not thought on the posibility of launching paralel logstash instances!.... cool!

best regards....

1.5.2 is latest.
But it might be better to raise a new thread on what your incompatibility concerns are.

I agree...

Best regards.