Elapsed and aggregate filters will not work properly when multiple logstash workers are used. In that case, how do we solve the same problem? Thank you.
same question with this https://stackoverflow.com/questions/37353365/calculating-time-between-events/37359000#37359000
Both these plugins require that all related events pass through a single thread so that they are processed in order, and as a side effect the tend to scale and perform badly.
As far as I can tell, you need to process all data in a single thread up until the point where you have managed to extract the identifier to use in these filters into a separate field. From that point on you just need to make sure that all events with the same identifier gets processed by the same thread.
You could at this point calculate a MURMUR hash of the identifier and send it to one of a number of pipelines (hash % # pipelines) using the new pipeline to pipeline communication. Depending on where the bulk of your processing takes place this may or may not make a big difference.
The other way is to instead implement this matching as a batch job that runs periodically against the raw data that has been inserted into Elasticsearch. This requires more work and is not real time, but should scale better as it does not restrict the flow of data through Logstash.
thanks so much for reply
but my case, i don't use multiple pipelines. i use
streaming (continuous logs to logstash)
one pipeline
2 pipeline.workers (host's CPU cores)
i want to calculate time difference between two logs with uniqueid
but if Elapsed and aggregate filters will not work properly, could you suggest what else can i use?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.