File input duplicates

Sali7007 · May 15, 2019, 11:51am

Hi
I have a file which is rewritten daily with new data.
Everything was working fine till I missed to update the file with new data.
So, Logstash processed the old data itself again generating duplicate data.

Is there a way that i can correct that(remove duplicate data)?
Is there a way to avoid this issue in future?

pastechecker · May 15, 2019, 12:13pm

Yes, there is a way.
Have a look on the fingerprint filter and MD5 hash calculation.
You calculate MD5 out of your message and later use it in the output as the document id when loading the data to elasticsearch.

If you want to remove duplicate data:

read the data from the elasticsearch in logstash (input elasticsearch)
in filter section calculate md5 out of the fields you want to have unique id
output to the elasticsearch with the document id calculated in a fingerprint section.

Cheers.

system · June 12, 2019, 12:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash write data to the elasticsearch how to remove duplication Logstash	4	652	July 6, 2017
ES query to check the existence of a document_id? Logstash	10	983	June 26, 2020
How to remove duplicate events in logstash Logstash	3	4368	January 4, 2017
Duplicate Lines Logstash	1	486	January 6, 2017
Removing ongoing duplicates Logstash	2	609	January 16, 2021

File input duplicates

Related topics