Elapsed filter with multiple workers..does it work or not?

Nikhil_Utane · November 21, 2017, 9:26am

Hi,

I tried searching but haven't got a definite answer.

Aggregate filter clearly mentions that:
"You should be very careful to set Logstash filter workers to 1 (-w 1 flag) for this filter to work correctly otherwise events may be processed out of sequence and unexpected results will occur."

But the same is not mentioned for Elapsed filter. Fundamentally I would expect the same limitation to apply for elapsed filter as well.

So is my assumption correct? (Q1)
If so, how can we achieve the conflicting goal of faster log ingestion along with the ability to use elapsed filter plugin? (Q2)
Also, does using multiple threads in filebeat matter at all? (Q3)
Right now I set "pipeline.workers: 8" in logstash.yml. Help says this defaults to number of cores.
I haven't tested elapsed filter after setting to 8 but with default value of 4 (no. of cores) it did seem to work fine.

-Thanks
Nikhil

Nikhil_Utane · November 22, 2017, 3:49am

I did some testing and see that with 8 worker threads in logstash and 3 logstash nodes load-balancing I miss about 15-20% of events.
Checking now if I can do something with the data that is already sitting in ES.

-Nikhil

Nikhil_Utane · November 22, 2017, 4:33am

Perhaps best option for me is to route all those events that need multi-line, aggregate, elapsed filter processing to a different logstash instance that runs only 1 worker thread. Since this will be a fraction of the overall events I should be able to meet both my objectives.

Comments?

-Nikhil

guyboertje · November 25, 2017, 5:01pm

You should use multiline in filebeat i.e. as close to the source as possible.

We are moving more and more towards stateless plugins (no local state) but we are building a shared state solution that stores the state in Elasticsearch meaning that plugins can share state across LS instances as well as worker threads. This solution will not be ready for some time.

If possible you should try using Elasticsearch to do aggregations and elapsed time calcs by rereading events from Elasticsearch using a suitable query and the elasticsearch input.

Nikhil_Utane · November 27, 2017, 6:22am

Thank You Guy for your response.

-Regards
Nikhil

joconner · November 30, 2017, 10:03pm

Is this a suggestion to not use the elapsed filter?

I have noticed that my elapsed plugin works most of the time. However, when handling several thousands of rapid logged BEGIN/END tag pairs, it can miss 10% of the pairs and just report an elapsed_end_without_start in an END tag.

Nikhil_Utane · December 1, 2017, 6:05am

Same here John. In my case I saw missing 15-20% of the events. With that
limitation it is for every individual to decide whether that is acceptable
or not.

Cheers
Nikhil

joconner · December 1, 2017, 6:40pm

Reducing the worker threads to 1 seems to work for some people. I may try that option. Thanks for the response.

system · December 29, 2017, 6:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate and elapsed filter plugins - multiple events processing Logstash	2	599	April 13, 2017
Elapsed and aggregate filter with multiple workers Logstash	6	1663	November 1, 2018
Elapsed filter and performance Logstash	4	320	June 25, 2019
How to use aggregate filter with multiple workers Logstash	4	2193	January 16, 2019
Aggregate filter plugin + Logstash	2	269	February 27, 2020

Elapsed filter with multiple workers..does it work or not?

Related topics