I'm reading this guide: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html, that discusses the deduplication in both filebeat and logstash, and recommends to set the document id before sending it to elasticsearch, this way the document get updated in case it already exists.
But I've tried this, and I used Filebeat fingerprint processor, filebeat fails to update the document in elasticsearch if it already have the same id, while logstash does update the document successfully.
I've read some comments in the discussion forms here about using logstash for such purpose as filebeat is not designed for this, but that's not what the elastic guide says!
What does your config look like? Can you show two documents that should have been deduplicated but we’re not?
What I understand, if two documents have the same "_id" then these are considered duplicates, the last document inserted should overwrite the previous one.
I'm running things in my lab for now, the configuration is very simple, reading a .csv file, you can't get it wrong, filebeat is inserting docs successfully, no issues, it's just when I insert a new line with same "myid", but slightly different info in other fields, it fails to update.
What would you suggest as a configuration?
Can you show two events that were inserted when an update/overwrite should have occurred? How are you parsing our the field you use?
This is my CSV file where filebeat is reading data from, these two events should have updated each other. I tried to push each line by itself, to garantee the order filebeat reads and index the data:
This is part of my filebeat.yml config, again, very simple stuff, nothing complicated, just copied sample config from elastic doc.
Which version of Filebeat are you using?
filebeat version 7.6.2 (amd64)
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.