Logstash don't detect duplicated documents


(Chrigui Mohamed) #1

I have to index dynamically multiple .json file in different folders but if I add the same json file that exists already logstash + Elasticsearch index it as a new file with the same documents. I need to find a solution in order to detect exited document and don't add them.
Here is my input block:
input {
file {
type => "solver"
path => ["D:/Users/G361164/Desktop/test_logstash0406/urban/indoor/*.json"]
start_position => beginning
# to read from the beginning of file
sincedb_path => "/dev/null"
codec => "json"
}
}


(Magnus B├Ąck) #2

Logstash doesn't detect duplicates but if you configure it to explicitly set the document id when indexing to ES (rather than having ES pick a random document id) you can make sure that the second time a document is indexed it'll just overwrite the old (identical) document. You can use a fingerprint filter to compute the document id based on the fields and their values, then reference the new field containing the hash with the document_id option for your elasticsearch output.

This isn't the first time this comes up here so you should be able to find details in the archives.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.