Filebeat deduplication fail to update index

Hi,

I'm reading this guide: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html, that discusses the deduplication in both filebeat and logstash, and recommends to set the document id before sending it to elasticsearch, this way the document get updated in case it already exists.

But I've tried this, and I used Filebeat fingerprint processor, filebeat fails to update the document in elasticsearch if it already have the same id, while logstash does update the document successfully.

I've read some comments in the discussion forms here about using logstash for such purpose as filebeat is not designed for this, but that's not what the elastic guide says!

Any ideas?

What does your config look like? Can you show two documents that should have been deduplicated but we’re not?

What I understand, if two documents have the same "_id" then these are considered duplicates, the last document inserted should overwrite the previous one.
I'm running things in my lab for now, the configuration is very simple, reading a .csv file, you can't get it wrong, filebeat is inserting docs successfully, no issues, it's just when I insert a new line with same "myid", but slightly different info in other fields, it fails to update.

    processors:
    - fingerprint:
        fields: ["myId"]
        target_field: "@metadata._id"
    output.elasticsearch:
      hosts: ["http://localhost:9200"] 
      index: my_index

What would you suggest as a configuration?

Can you show two events that were inserted when an update/overwrite should have occurred? How are you parsing our the field you use?

Hi Christian,

This is my CSV file where filebeat is reading data from, these two events should have updated each other. I tried to push each line by itself, to garantee the order filebeat reads and index the data:

myid,firstName,lastName
1001,mustapha,elastic
1001,mustapha,arakji

This is part of my filebeat.yml config, again, very simple stuff, nothing complicated, just copied sample config from elastic doc.

processors:
- decode_csv_fields:
     fields:
       message: decoded_message.csv
     separator: ","
     ignore_missing: false
     overwrite_keys: true
     trim_leading_space: false
     fail_on_error: true

- extract_array:
    field: decoded_message.csv
    mappings:
        myid: 0
        firstName: 1
        lastName: 2

- fingerprint:
      fields: ["myid"]
      target_field: "@metadata._id"

output.elasticsearch:
  hosts: ["http://localhost:9200"] 
  index: my_index

Which version of Filebeat are you using?

filebeat version 7.6.2 (amd64)

Anyone?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.