How to use aggregate filter with multiple workers


(Manoj Hettiarachchi) #1

I am using logstash aggregate filter to aggregate two log lines with same "uuid"
But in the documentation, it is mentioned that " You should be very careful to set Logstash filter workers to 1 ( -w 1 flag) for this filter to work correctly otherwise events may be processed out of sequence and unexpected results will occur."

Since my system has a considerable traffic I am using the default number of workers " Number of the host’s CPU cores"

Because of this, I have found out that most of the logs were not properly aggregated.

Do we have any alternative method to execute the aggregation functionality by keeping multiple workers?

Please advice.


(Christian Dahlqvist) #2

The aggregate filter indeed has this limitation, which limits performance considerable and prevents scaling to multiple threads and Logstash instances. To get a solution that scales it is probably better to have a solution that does not rely on the ingest layer to handle this.

One option could be to have a batch process that periodically queries new data and updates documents where needed. This would typically run externally to Elasticsearch and be implemented using one of the language client.

You could also create an entity-centric index where you store a single document per UUID (and use this as the document ID). When you find a document that should be aggregated, you update this document (first time it would be indexed) while at the same time writing the document to the standard index.


(Manoj Hettiarachchi) #3

@Christian_Dahlqvist,

Thanks for the quick response.
I will try the options that you have mentiond and update my results here.