Duplicated date in my elastic

Hello Team,
Greeting,

I have a situation when the data is being pushing into my elastic twice, a duplicate data.
now both of them have different id and other than that everything is same. How do I get over this.

My filebeat

filebeat.inputs:
- type: log
  fields_under_root: true
  exclude_lines:     ["^$"]
  tail_files:        true
  paths:
  - '/var/www/*.log'

processors:
  - add_id: ~
  - add_tags:
      tags: [XXX]
      target: "application"

and in my Logstash output I have

else if [@metadata][_id] and "XXX" in [application] {
        elasticsearch {
        hosts => ["https://es:9200"]
        index => "write-alias"
        user => "log-user"
        }
    }

Can you guys please help me on this?

You need to share your entire logstash configuration, it is not possible to know what could be the issue with only this part.

The add_id processor in filebeat adds a unique value for [@metadata][_id]. You want duplicate events to have the same id, not a unique one. I would suggest a fingerprint filter, and configuring it to use a hash (not MAC) of whatever set of fields you think make an event unique.

1 Like

Hello Badger, thanks for the response.
Here is what I have right now. when log is generated there is specific req id that get created and that is associated to couple of logs, something like this,

abc123xyz [time] Started......
abc123xyz [time] Processing......
abc123xyz [time] some_application_log......
abc123xyz [time] Completed......

Time is same, reqid is same and each line has different fields no unique one.
How do you think I should proceed in this case.

Thanks

You can fingerprint the entire [message] field, which is what the filter does by default.

fingerprint { target => "[@metadata][id]" method => SHA256 }

then use document_id => "%{[@metadata][id]}" in the elasticsearch output.

Hello Badger, thanks for the response. Can you help me with this.

1d67c3b0 I, [2022-10-04T22:46:46.898239 #3722]  INFO -- : Log-MESSAGE

After observing the logs the only field that is uniques I recognized is "2022-10-04T22:46:46.898239" with all 6 decimals.
How do I extract the that field to associate as a unique id. Here is what the documentation says on applying processors in Filebeat.

processors:
  - fingerprint:
      fields: ["field1", "field2"]
      target_field: "@metadata._id"

I guess its not that easy to mention something like

processors:
  - fingerprint:
      fields: ["timestamp"]
      target_field: "@metadata._id"

We might end up using dissect or what not. can you please help me here.
Thanks in advance.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.