Fingerprint duplicate hash for different events

Hi,

I have a strange issue which came to light after starting to use datastreams (and thus create events instead of updates).
The following fingerprint config we have in logstash:

             ### Add a fingerprint to prevent duplicate log events.
                fingerprint {
                        concatenate_sources => true
                        source => ["message","agent.hostname"]
                        target => "[@metadata][fingerprint]"
                        method => "SHA1"
                        key => "deduplication-key"
                }

In Elastic This event exists:

{
  "_index": ".ds-agl-api-ds-2021.08.25-001447",
  "_type": "_doc",
  "_id": "c3b600da568b237e44e68f1a5bd718246e58a908",
  "_score": 1,
  "_source": {
    "input": {
      "type": "log"
    },
    "ecs": {
      "version": "1.6.0"
    },
    "log-message": "Configuration cache updated!",
    "tags": [
      "avs6",
      "api-log",
      "apigateway",
      "asd",
      "beats_input_codec_plain_applied"
    ],
    "log-level": "INFO",
    "message": "ts: 2021-08-25 10:36:40.769 | logLevel: INFO | appId: AGL | thread:  | SID: undefined | TN: undefined | clientIp: undefined | userId: ANONYMOUS | apiType: NANO | api:  | platform:  | eventType: NONE | message:  Configuration cache updated!",
    "log": {
      "file": {
        "path": "/product/AGL/agl-core/logs/agl.log"
      },
      "offset": 3737176171
    },
    "api-type": "NANO",
    "@version": "1",
    "fields": {
      "environment": "production"
    },
    "@timestamp": "2021-08-25T08:36:40.769Z",
    "app-id": "AGL",
    "user-id": "ANONYMOUS",
    "agent": {
      "name": "papps1443.prdl.itv.local",
      "ephemeral_id": "1daa9993-bf4e-4ce0-bc00-bb3762c88820",
      "version": "7.10.2",
      "hostname": "papps1443.prdl.itv.local",
      "id": "139e78fb-a5d6-47f9-813d-7d63e08b5d32",
      "type": "filebeat"
    },
    "event-type": "NONE"
  },
  "fields": {
    "log-message": [
      "Configuration cache updated!"
    ],
    "app-id": [
      "AGL"
    ],
    "api-type": [
      "NANO"
    ],
    "event-type": [
      "NONE"
    ],
    "user-id": [
      "ANONYMOUS"
    ],
    "input.type": [
      "log"
    ],
    "log.offset": [
      3737176171
    ],
    "fields.environment": [
      "production"
    ],
    "agent.hostname": [
      "papps1443.prdl.itv.local"
    ],
    "message": [
      "ts: 2021-08-25 10:36:40.769 | logLevel: INFO | appId: AGL | thread:  | SID: undefined | TN: undefined | clientIp: undefined | userId: ANONYMOUS | apiType: NANO | api:  | platform:  | eventType: NONE | message:  Configuration cache updated!"
    ],
    "tags": [
      "avs6",
      "api-log",
      "apigateway",
      "asd",
      "beats_input_codec_plain_applied"
    ],
    "agent.type": [
      "filebeat"
    ],
    "@timestamp": [
      "2021-08-25T08:36:40.769Z"
    ],
    "agent.id": [
      "139e78fb-a5d6-47f9-813d-7d63e08b5d32"
    ],
    "ecs.version": [
      "1.6.0"
    ],
    "log-level": [
      "INFO"
    ],
    "log.file.path": [
      "/product/AGL/agl-core/logs/agl.log"
    ],
    "@version": [
      "1"
    ],
    "agent.ephemeral_id": [
      "1daa9993-bf4e-4ce0-bc00-bb3762c88820"
    ],
    "agent.name": [
      "papps1443.prdl.itv.local"
    ],
    "agent.version": [
      "7.10.2"
    ]
  }
}

In my logstash log, this error exists:

[2021-08-25T10:36:42,590][WARN ][logstash.outputs.elasticsearch] 
Failed action {
    :status=>409,
    :action=>[
        "create",
        {
            :_id=>"c3b600da568b237e44e68f1a5bd718246e58a908",
            :_index=>"agl-api-ds",
            :routing=>nil
        },
        {
            "input"=>{
                "type"=>"log"
            },
            "ecs"=>{
                "version"=>"1.6.0"
            },
            "log-message"=>"Configuration cache updated!",
            "tags"=>[
                "avs6",
                "api-log",
                "apigateway",
                "asd",
                "beats_input_codec_plain_applied"
            ],
            "log-level"=>"INFO",
            "message"=>"ts: 2021-08-25 10:36:40.769 | logLevel: INFO | appId: AGL | thread:  | SID: undefined | TN: undefined | clientIp: undefined | userId: ANONYMOUS | apiType: NANO | api:  | platform:  | eventType: NONE | message:  Configuration cache updated!",
            "log"=>{
                "file"=>{
                    "path"=>"/product/AGL/agl-core/logs/agl.log"
                },
                "offset"=>2101471568
            },
            "api-type"=>"NANO",
            "@version"=>"1",
            "fields"=>{
                "environment"=>"production"
            },
            "@timestamp"=>2021-08-25T08: 36: 40.769Z,
            "app-id"=>"AGL",
            "user-id"=>"ANONYMOUS",
            "agent"=>{
                "name"=>"papps1632.prdl.itv.local",
                "ephemeral_id"=>"740640a2-daed-4807-9ff6-55bfdabeb066",
                "version"=>"7.10.2",
                "hostname"=>"papps1632.prdl.itv.local",
                "id"=>"b0d97044-dc4d-4778-8a63-a64368a9b26c",
                "type"=>"filebeat"
            },
            "event-type"=>"NONE"
        }
    ],
    :response=>{
        "create"=>{
            "_index"=>".ds-agl-api-ds-2021.08.25-001447",
            "_type"=>"_doc",
            "_id"=>"c3b600da568b237e44e68f1a5bd718246e58a908",
            "status"=>409,
            "error"=>{
                "type"=>"version_conflict_engine_exception",
                "reason"=>"[c3b600da568b237e44e68f1a5bd718246e58a908]: version conflict, document already exists (current version [1])",
                "index_uuid"=>"1U0Gff1CQVab4aaDbTYqLQ",
                "shard"=>"0",
                "index"=>".ds-agl-api-ds-2021.08.25-001447"
            }
        }
    }
}

Please consider the agent hostname in both situations. Both messages came from a different server. How is it possible this generates the same ID?

With some testing it seems that the agent.hostname (or agent.name if we use that one, simply is not added to the string to be hashed. Does the concatenate_source parameter even work?

The event has no agent.hostname field, it has an [agent][hostname] field. If a field does not exist it is not included in the fingerprint.

Thnx.
Looks like this indeed was the solution. Very confusing that the dot notation is mentioned on the concerning documentation pages though.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.