How to prevent duplicates in filebeat

My input data (which is json-formatted) as a unique field.
According to the documentation, this should use it as a unique identifier, and thus pervent duplicates:

processor:
- fingerprint:
    fields: ["tx"]
    target_field: "@metadata._id"

However, it doesn't help.
I did verify (in debug mode) that I have a unique value set in the @metadata: { _id: "xxx" }
(it is not a copy of my field, but probably a hash on it: verified that sending multiple times the same record generates the same ID)

when processing copies of a file, it generates more records, instead of de-duping them.

What method tool are you using to ingest / index your documents?

And are you saying your are seeing multiple documents in the same index with the same _id?

If so please show us that.

From the docs looks like you're doing the right thing.

EDIT I just ran a file through Filebeat directly to Elasticsearch with with duplicates on my fingerprint fields and this config and it worked as expected no duplicates ... so perhaps tell us more about the rest of you ingestion

Just for example I just put these lines in a "log" file

Apple
Dell
Google
Apple
Amazon
Amazon
Amazon
Google
Dell
Google
Google
Google

These fields get interpreted as the message field

I used this

  - fingerprint:
      fields: ["message"]
      target_field: "@metadata._id"

and this is my result, notice only 4 documents and they have the fingerprint as the _id

GET filebeat-7.16.3-2022.02.07-000001/_search

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "filebeat-7.16.3-2022.02.07-000001",
        "_type" : "_doc",
        "_id" : "f0aa2b6382e5f554aeb83a966a6148c988a9a0553d04d4bcd06747b2bac21d74",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-02-07T01:24:42.121Z",
          "ecs" : {
            "version" : "1.12.0"
          },
          "container" : {
            "id" : "7.16.3"
          },
          "log" : {
            "offset" : 6,
            "file" : {
              "path" : "/Users/sbrown/workspace/elastic-install/7.16.3/filebeat-7.16.3-darwin-x86_64/test.log"
            }
          },
          "message" : "Apple",
          "input" : {
            "type" : "filestream"
          },
          "agent" : {
            "id" : "04373b11-e7ab-4632-8e54-e30a2d3f8227",
            "name" : "hyperion",
            "type" : "filebeat",
            "version" : "7.16.3",
            "hostname" : "hyperion",
            "ephemeral_id" : "fa7650a3-4e5d-44b6-b408-c57402a6c93f"
          }
        }
      },
      {
        "_index" : "filebeat-7.16.3-2022.02.07-000001",
        "_type" : "_doc",
        "_id" : "52fadd3719f7ae3eb341061bfedc2653118c464cd75aff2bf10d6e42d9b32f92",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-02-07T01:24:42.121Z",
          "ecs" : {
            "version" : "1.12.0"
          },
          "agent" : {
            "ephemeral_id" : "fa7650a3-4e5d-44b6-b408-c57402a6c93f",
            "id" : "04373b11-e7ab-4632-8e54-e30a2d3f8227",
            "name" : "hyperion",
            "type" : "filebeat",
            "version" : "7.16.3",
            "hostname" : "hyperion"
          },
          "container" : {
            "id" : "7.16.3"
          },
          "log" : {
            "file" : {
              "path" : "/Users/sbrown/workspace/elastic-install/7.16.3/filebeat-7.16.3-darwin-x86_64/test.log"
            },
            "offset" : 11
          },
          "message" : "Dell",
          "input" : {
            "type" : "filestream"
          }
        }
      },
      {
        "_index" : "filebeat-7.16.3-2022.02.07-000001",
        "_type" : "_doc",
        "_id" : "29aea56777a90acd076e75cf91aef1fe4c5d8bd2b5786194fc7b5ef7dd9a76c7",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-02-07T01:24:42.121Z",
          "input" : {
            "type" : "filestream"
          },
          "ecs" : {
            "version" : "1.12.0"
          },
          "agent" : {
            "version" : "7.16.3",
            "hostname" : "hyperion",
            "ephemeral_id" : "fa7650a3-4e5d-44b6-b408-c57402a6c93f",
            "id" : "04373b11-e7ab-4632-8e54-e30a2d3f8227",
            "name" : "hyperion",
            "type" : "filebeat"
          },
          "container" : {
            "id" : "7.16.3"
          },
          "log" : {
            "file" : {
              "path" : "/Users/sbrown/workspace/elastic-install/7.16.3/filebeat-7.16.3-darwin-x86_64/test.log"
            },
            "offset" : 18
          },
          "message" : "Google"
        }
      },
      {
        "_index" : "filebeat-7.16.3-2022.02.07-000001",
        "_type" : "_doc",
        "_id" : "c5a60e4fed8743f73ec171491c06f1e991b20880e57ced4f0d8de0cc2106f692",
        "_score" : 1.0,
        "_source" : {
          "@timestamp" : "2022-02-07T01:24:42.121Z",
          "container" : {
            "id" : "7.16.3"
          },
          "log" : {
            "file" : {
              "path" : "/Users/sbrown/workspace/elastic-install/7.16.3/filebeat-7.16.3-darwin-x86_64/test.log"
            },
            "offset" : 31
          },
          "message" : "Amazon",
          "input" : {
            "type" : "filestream"
          },
          "ecs" : {
            "version" : "1.12.0"
          },
          "agent" : {
            "ephemeral_id" : "fa7650a3-4e5d-44b6-b408-c57402a6c93f",
            "id" : "04373b11-e7ab-4632-8e54-e30a2d3f8227",
            "name" : "hyperion",
            "type" : "filebeat",
            "version" : "7.16.3",
            "hostname" : "hyperion"
          }
        }
      }
    ]
  }
}

not sure, but now it works, and entries are de-duped correctly, with the same configuration.
thanks anyway.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.