Deduplicate data

So I have data that is getting exported to a JSON file, that's being uploaded to Elastic using FIlebeat.
One of the fields can change, and when that happens, I would like the record being updated in Elastic instead of uploading a new entry, which is the case now.

In this topic I found some information on this, it's called deduplication, and logically you can achieve that by giving the two records the same ID.

I added this this to my filebeat.yml:

filebeat.inputs:
- type: filestream
  json.document_id: "AlertId"

AlertId is a unique ID that's within the data, so I would like to use that as the document ID.
But this does not seem to work. Elastic still generates it's own ID. Can anyone explain what I have to do, to get this working?

On filestream you have to use the ndjson parser to set the ID:

filebeat.inputs:
  - type: filestream
    id: "my-unique-filestream-id"
    paths: /tmp/flog/*.log
    parsers:
      - ndjson:
          document_id: "my-id-field"

or the decode_json_fields processor:

filebeat.inputs:
  - type: filestream
    id: "my-other-unique-filestream-id"
    paths: /tmp/flog/*.log
    processors:
      - decode_json_fields:
          document_id: "my-id-field"
          fields: ["message"]
          max_depth: 1
          target: ""
1 Like

Thank you! Works perfectly now.

I spook a little too soon. The document id is now correct. But what I expected to happen, does not happen. When one field changes, this does not get changed in Elastic now, but there is also no new entry. It looks like filebeat/elastic is ignoring it now somehow.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.