Elapsed and aggregate filter with multiple workers


(ssh) #1

Elapsed and aggregate filters will not work properly when multiple logstash workers are used. In that case, how do we solve the same problem? Thank you.

same question with this https://stackoverflow.com/questions/37353365/calculating-time-between-events/37359000#37359000


(Christian Dahlqvist) #2

Both these plugins require that all related events pass through a single thread so that they are processed in order, and as a side effect the tend to scale and perform badly.

As far as I can tell, you need to process all data in a single thread up until the point where you have managed to extract the identifier to use in these filters into a separate field. From that point on you just need to make sure that all events with the same identifier gets processed by the same thread.

You could at this point calculate a MURMUR hash of the identifier and send it to one of a number of pipelines (hash % # pipelines) using the new pipeline to pipeline communication. Depending on where the bulk of your processing takes place this may or may not make a big difference.

The other way is to instead implement this matching as a batch job that runs periodically against the raw data that has been inserted into Elasticsearch. This requires more work and is not real time, but should scale better as it does not restrict the flow of data through Logstash.


(ssh) #3

thanks so much for reply
but my case, i don't use multiple pipelines. i use

  • streaming (continuous logs to logstash)
  • one pipeline
  • 2 pipeline.workers (host's CPU cores)

i want to calculate time difference between two logs with uniqueid
but if Elapsed and aggregate filters will not work properly, could you suggest what else can i use?

appreciated!!!


(Christian Dahlqvist) #4

For those filters you have to use a single pipeline worker as all related events need to go through the same thread.


(ssh) #5

but i'm worry performance impact. currently it goes live almost 1 year


(Christian Dahlqvist) #6

Those filters have that limitation and it will limit throughput and not scale well. There is no easy solution.

This is why I outlined some possible alternatives above.


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.