Logstash don't detect duplicated documents

I have to index dynamically multiple .json file in different folders but if I add the same json file that exists already logstash + Elasticsearch index it as a new file with the same documents. I need to find a solution in order to detect exited document and don't add them.
Here is my input block:
input {
file {
type => "solver"
path => ["D:/Users/G361164/Desktop/test_logstash0406/urban/indoor/*.json"]
start_position => beginning
# to read from the beginning of file
sincedb_path => "/dev/null"
codec => "json"
}
}

Logstash doesn't detect duplicates but if you configure it to explicitly set the document id when indexing to ES (rather than having ES pick a random document id) you can make sure that the second time a document is indexed it'll just overwrite the old (identical) document. You can use a fingerprint filter to compute the document id based on the fields and their values, then reference the new field containing the hash with the document_id option for your elasticsearch output.

This isn't the first time this comes up here so you should be able to find details in the archives.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.