Elapsed and aggregate filter with multiple workers

shwesinhan · October 2, 2018, 5:58am

Elapsed and aggregate filters will not work properly when multiple logstash workers are used. In that case, how do we solve the same problem? Thank you.

same question with this https://stackoverflow.com/questions/37353365/calculating-time-between-events/37359000#37359000

Christian_Dahlqvist · October 2, 2018, 8:12am

Both these plugins require that all related events pass through a single thread so that they are processed in order, and as a side effect the tend to scale and perform badly.

As far as I can tell, you need to process all data in a single thread up until the point where you have managed to extract the identifier to use in these filters into a separate field. From that point on you just need to make sure that all events with the same identifier gets processed by the same thread.

You could at this point calculate a MURMUR hash of the identifier and send it to one of a number of pipelines (hash % # pipelines) using the new pipeline to pipeline communication. Depending on where the bulk of your processing takes place this may or may not make a big difference.

The other way is to instead implement this matching as a batch job that runs periodically against the raw data that has been inserted into Elasticsearch. This requires more work and is not real time, but should scale better as it does not restrict the flow of data through Logstash.

shwesinhan · October 4, 2018, 8:38am

thanks so much for reply
but my case, i don't use multiple pipelines. i use

streaming (continuous logs to logstash)
one pipeline
2 pipeline.workers (host's CPU cores)

i want to calculate time difference between two logs with uniqueid
but if Elapsed and aggregate filters will not work properly, could you suggest what else can i use?

appreciated!!!

Christian_Dahlqvist · October 4, 2018, 8:40am

For those filters you have to use a single pipeline worker as all related events need to go through the same thread.

shwesinhan · October 4, 2018, 8:41am

but i'm worry performance impact. currently it goes live almost 1 year

Christian_Dahlqvist · October 4, 2018, 8:43am

Those filters have that limitation and it will limit throughput and not scale well. There is no easy solution.

This is why I outlined some possible alternatives above.

system · November 1, 2018, 8:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elapsed filter with multiple workers..does it work or not? Logstash	8	1900	December 29, 2017
Elapsed filter and performance Logstash	4	320	June 25, 2019
Aggregate and elapsed filter plugins - multiple events processing Logstash	2	599	April 13, 2017
Elapsed filter. Random messages order in pipeline Logstash	2	291	February 26, 2019
Logstash Elapsed filter Logstash	1	162	December 8, 2020

Elapsed and aggregate filter with multiple workers

Related topics