Removing ongoing duplicates

cyberzlo · December 19, 2020, 11:37am

This is what I need however I would like do it on input due parsing jsons. What I mean this tutorial show how do it post-factum, but I would like do it every time I recieve new data basing for example on data from last 1 minute or so. Is this possible and will not generate high load? When I recieve data to logstash they are duplicates in pairs, like system logs when same line will be 10 times but in successive so it is no needed to check like whole index, just last records, even like last 10 records etc. What I need is removing ongoing duplicates.

Badger · December 19, 2020, 3:36pm

You could use a ruby filter. Keep the most recent messages (or a hash of them) in an array and test whether the array .include? the current message. Then .shift to remove the first entry and .push to add the current message as the last entry. The cost grows with the number of entries in the array, since Ruby will iterate over each entry in .include?

system · January 16, 2021, 3:36pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
File input duplicates Logstash	2	506	June 12, 2019
How to remove duplicate values? Logstash	1	464	December 25, 2019
Logstash don't detect duplicated documents Logstash	2	278	July 3, 2018
How to remove duplicate events in logstash Logstash	3	4368	January 4, 2017
Delete Duplicate Documents with Elasticsearch and Ruby Logstash	5	470	June 4, 2020

Removing ongoing duplicates

Related topics