Elastic Agent 9.1.5 Fingerprint Processor Inconsistency - Breaks Log Integrity

Elastic Agent 9.1.5 Fingerprint Processor Bug - Different Hashes for Identical Data

Hey everyone,

I've run into a pretty serious issue with the fingerprint processor in Elastic Agent 9.1.5 that I wanted to share with the community. I'm working on log integrity verification and discovered that the fingerprint processor is producing different SHA-256 hashes for identical data, which obviously breaks everything.

What I Found

I was testing a simple setup where I copy a field and then fingerprint both the original and the copy to verify they produce the same hash. They don't.

Here's my agent configuration:

processors:
  - copy_fields:
      fields:
        - from: message
          to: message_copy
  - fingerprint:
      fields: ["message"]
      method: sha256
      target_field: fingerprint_original
  - fingerprint:
      fields: ["message_copy"]
      method: sha256
      target_field: fingerprint_copy

The results I'm getting:

  • fingerprint_original: 17afa9f77de7a61765a653540297eb83a334e4c04080c28270e9021c77ce94b
  • fingerprint_copy: f4538632d482ca41596d4457bf99c8eca1ac800313bf1d3c0fc1803506452afaf

I've verified in Elasticsearch that both fields contain exactly the same JSON content, so the copy operation is working fine. The fingerprint processor is just being inconsistent.

Second Issue - Agent vs Ingest Pipeline

I also tested fingerprinting the same field in both the agent and an ingest pipeline to see if they match. They don't.

Agent config:

- fingerprint:
    fields: ["data.stream.dataset"]
    method: sha256
    encoding: base64
    target_field: fingerprint_original

Ingest pipeline:

{
  "fingerprint": {
    "fields": ["data.stream.dataset"],
    "method": "SHA-256",
    "target_field": "event.fingerprint_check"
  }
}

Results:

  • Agent: BqsSBwNv+r2YVwm1LZJAYRZWnCvvG7YSUMGqV
  • Ingest: FVmXuXKucS93wks1gTlbrIQb2nCdCBbzfzfnr1nqck

Interestingly, when I run the ingest pipeline fingerprint processor multiple times on the same field, those results are consistent with each other. So the ingest pipeline processor seems to work correctly, but there's some inconsistency between the agent and ingest pipeline processing.

What This Means

This basically makes it impossible to implement any kind of reliable log integrity checking or tamper detection. SHA-256 hashes should be deterministic - identical input should always produce identical output. That's not happening here.

Has Anyone Else Seen This?

I'm running Agent 9.1.5 (the latest version). Has anyone else noticed fingerprint inconsistencies? I'm planning to report this as a bug, but wanted to check if others have encountered similar issues.

For now, I don't have a reliable workaround since copying the hash value instead of recomputing it only fixes the first issue, not the agent/ingest pipeline mismatch.

Any thoughts or similar experiences would be really helpful!

@blakester205 this behaviour is consistent with the documentation, for the [Beats fingerprint processor](Generate a fingerprint of an event | Beats):

The value that is hashed is constructed as a concatenation of the field name and field value separated by |. For example |field1|value1|field2|value2|.

So if you’re hashing two different fields, even if they have exactly the same content, the final hash will be different because the data hashed by Filebeat is different.

The [Elasticsearch fingerprint processor]( Fingerprint processor | Reference ) behaves differently:

Computes a hash of the document’s content. You can use this hash for content fingerprinting

Looking at the description for field:

Array of fields to include inthe fingerprint. For objects, the processor hashes both the field key andvalue. For other fields, the processor hashes only the field value.

For your example case, you’re using a single filed, so only the value is hashed, effectively the data hashed by it and Filebeat’s fingerprint processor are different.

1 Like

It does not seem that the Fingerprint created by the Agent/Beats will ever match the Fingerprint created by the Ingest Pipeline processor as they work differently.

This is the same inconsistency you have if you want to match the Fingerprint generated by the fingerprint filter in Logstash and the processor in an Ingest Pipeline, the Logstash filter works in the same way as the beats processor.

To be able to generate a fingerprint with Beats (or Logstash) and have it to match the fingerprint generated on an Ingest pipeline, the fingerprint processor on Elasticsearch would need to be changed to supporting working in the same way as its similar in Beats and Logstash.

One alternative however, if you want to create a fingerprint of a previously know field, like message or event.original, would be to create a field like this in the ingest pipeline:

|message|value-of-the-message-field

This way, the fingerprint processor should match the one from beats.

Something like this:

   "processors": [
      {
        "set" : {
          "description" : "create temporary fingerprint string",
          "field" : "_tmp_fingerprint",
          "value": "|message|{{{message}}}"
        }
      }
    ]