Avoiding duplicate records


(Boris Goldowsky) #1

I am using the logstash JDBC input plugin to push new rows from a database query into Elasticsearch, updating any old items that have changed. I'd like to avoid duplicates, so I tried to use an "upsert" pattern:

output {
elasticsearch {
hosts => "http://....:9200"
index => "snudle-qa-event-%{+YYYY.MM.dd}"
action => "update"
doc_as_upsert => true
document_id => "%{id}"
}
}

However, I am still seeing duplicates - in addition to the records with my database IDs, there are some additional records with random-looking new Id values, eg: AV7dtU2XQygf4KBYvrIq .

Any clues what I am doing wrong?


(Magnus Bäck) #2

Wild guess: Do you have any extra files in /etc/logstash/conf.d (or wherever your configuration files are stored)?

With the configuration you've shown, the document id will always be identical to the contents of the id field.


(Boris Goldowsky) #3

Hm, there's my project.conf file, a project.conf~ and a project.conf.bak representing previous versions. Would those get read? I'm used to apache, which only considers files with the expected .conf suffix.


(Magnus Bäck) #4

Would those get read?

Yes. I think Logstash 6.0 only reads *.conf though.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.