Filebeat deduplication fail to update index

mustapha.arakji · May 10, 2020, 5:49am

Hi,

I'm reading this guide: https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-deduplication.html, that discusses the deduplication in both filebeat and logstash, and recommends to set the document id before sending it to elasticsearch, this way the document get updated in case it already exists.

But I've tried this, and I used Filebeat fingerprint processor, filebeat fails to update the document in elasticsearch if it already have the same id, while logstash does update the document successfully.

I've read some comments in the discussion forms here about using logstash for such purpose as filebeat is not designed for this, but that's not what the elastic guide says!

Any ideas?

Christian_Dahlqvist · May 10, 2020, 9:58am

What does your config look like? Can you show two documents that should have been deduplicated but we’re not?

mustapha.arakji · May 11, 2020, 12:23am

What I understand, if two documents have the same "_id" then these are considered duplicates, the last document inserted should overwrite the previous one.
I'm running things in my lab for now, the configuration is very simple, reading a .csv file, you can't get it wrong, filebeat is inserting docs successfully, no issues, it's just when I insert a new line with same "myid", but slightly different info in other fields, it fails to update.

    processors:
    - fingerprint:
        fields: ["myId"]
        target_field: "@metadata._id"
    output.elasticsearch:
      hosts: ["http://localhost:9200"] 
      index: my_index

What would you suggest as a configuration?

Christian_Dahlqvist · May 11, 2020, 5:11am

Can you show two events that were inserted when an update/overwrite should have occurred? How are you parsing our the field you use?

mustapha.arakji · May 11, 2020, 9:46am

Hi Christian,

This is my CSV file where filebeat is reading data from, these two events should have updated each other. I tried to push each line by itself, to garantee the order filebeat reads and index the data:

myid,firstName,lastName
1001,mustapha,elastic
1001,mustapha,arakji

This is part of my filebeat.yml config, again, very simple stuff, nothing complicated, just copied sample config from elastic doc.

processors:
- decode_csv_fields:
     fields:
       message: decoded_message.csv
     separator: ","
     ignore_missing: false
     overwrite_keys: true
     trim_leading_space: false
     fail_on_error: true

- extract_array:
    field: decoded_message.csv
    mappings:
        myid: 0
        firstName: 1
        lastName: 2

- fingerprint:
      fields: ["myid"]
      target_field: "@metadata._id"

output.elasticsearch:
  hosts: ["http://localhost:9200"] 
  index: my_index

Christian_Dahlqvist · May 11, 2020, 4:34pm

Which version of Filebeat are you using?

mustapha.arakji · May 15, 2020, 11:15am

filebeat version 7.6.2 (amd64)

mustapha.arakji · May 21, 2020, 10:03am

Anyone?

system · June 18, 2020, 10:04am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat and updating existing docs Beats filebeat	31	2876	February 6, 2023
Duplication in Filebeat to Elasticsearch data pushing Beats filebeat	5	704	December 28, 2017
Filebeat and updating documents Beats filebeat	2	1130	March 7, 2019
Deduplicate data Beats filebeat	4	522	July 15, 2022
How can i process the duplication id or custome document_id on filebeat? Beats filebeat	2	739	August 2, 2017

Filebeat deduplication fail to update index

Related topics