How not to duplicate data in Elasticserch

Daniel_Oliveira · May 23, 2022, 7:21pm

Hello gentlemen.

I have a logstash pipeline that processes data from a .csv file

this file has a column called id

how do i not duplicate the data with the same id in case i reprocessed the file or another file that contains the same id?

follow my current filter

filter {
csv {
separator => ","
skip_header => "true"
columns => ["id","product","job description","uuid Value"]
}

leandrojmp · May 23, 2022, 7:45pm

You should use the document_id option in the elasticsearch output.

Something like this:

output {
    elasticsearch {
        hosts => ["hosts"]
        index => "index-name"
        document_id => "%{id}"
    }
}

system · June 20, 2022, 7:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to avoid data duplication in elasticsearch when data send from logstash? Elasticsearch	5	533	October 17, 2018
Logstash write data to the elasticsearch how to remove duplication Logstash	4	669	July 6, 2017
Logstash adding duplicate rows for every run Logstash	11	14776	July 6, 2017
Logstash -> drop duplicate -> elasticsearch Logstash	8	3582	April 10, 2017
Preventing duplicates when reading the same data multiple times Logstash	3	742	June 22, 2021