Logstash don't detect duplicated documents

chrigui94 · June 5, 2018, 7:13am

I have to index dynamically multiple .json file in different folders but if I add the same json file that exists already logstash + Elasticsearch index it as a new file with the same documents. I need to find a solution in order to detect exited document and don't add them.
Here is my input block:
input {
file {
type => "solver"
path => ["D:/Users/G361164/Desktop/test_logstash0406/urban/indoor/*.json"]
start_position => beginning
# to read from the beginning of file
sincedb_path => "/dev/null"
codec => "json"
}
}

magnusbaeck · June 5, 2018, 8:56pm

Logstash doesn't detect duplicates but if you configure it to explicitly set the document id when indexing to ES (rather than having ES pick a random document id) you can make sure that the second time a document is indexed it'll just overwrite the old (identical) document. You can use a fingerprint filter to compute the document id based on the fields and their values, then reference the new field containing the hash with the document_id option for your elasticsearch output.

This isn't the first time this comes up here so you should be able to find details in the archives.

system · July 3, 2018, 8:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Removing Duplicate documents in ElasticSearch Elasticsearch	2	362	June 11, 2019
ES query to check the existence of a document_id? Logstash	10	983	June 26, 2020
Logstash -> drop duplicate -> elasticsearch Logstash	8	3515	April 10, 2017
Avoid duplicate document in different Indices,Logsatsh Logstash	2	495	July 28, 2022
How not to overwrite duplicates? save old documents Logstash	3	814	July 23, 2020

Logstash don't detect duplicated documents

Related topics