Unclear how logstash fingerprint treats empty/non-existing fields

Hello everyone,

I have a hard time to understand the behaviour fo the logstash fingerprint filter if a field is empty or non-existent.
My scenario is the following:

I'm reading a json-file with logstash and create a document for each json object in an elasticsearch index.
I'm calculating an MD5 fingerprint from some of the key-value-pairs (including nested objects) composing the json object.

{
    "key_a": "value_a",
    "key_b": "value_b",
    "key_c": "value_c",
    "key_d": {
        "key_e": "value_e",
        "key_f": "value_f"
    }
}

Fingerprint filter:

filter {
    fingerprint {
        source => ["key_a", "key_b", "key_d"]
        concatenate_sources => true
        concatenate_all_fields => true
        target => "doc_fingerprint"
        method => "MD5"
        key => "integrity"
    }
}

As you can see from the example above, I am not using all key-value-pairs of the json object.
Moreover not all keys are present in every json object. For example in some instances the "key_b" is missing. In other instances "key_a" can have no value: "" or nil

Now to the actual issue:
I have a second logstash-file which I use to read all the documents from the previously filled elasticsearch index and to calculate the fingerprint again. The filter remains unchanged (except for the target) since the field name in the documents and the keys in the json object are identical.

filter {
    fingerprint {
        source => ["key_a", "key_b", "key_d"]
        concatenate_sources => true
        concatenate_all_fields => true
        target => "second_doc_fingerprint"
        method => "MD5"
        key => "integrity"
    }

However, this condition following the fingerprint filter is always false:


    if second_doc_fingerprint != doc_fingerprint {
        ruby { code => 'puts "unequal"' }
    }
}

I'm trying to find a debug interface to understand what's happening here.
Is it possible that the empty or non-exisiting fields are related to the issue?
I tried to recalculate the fingerprint with a python script unsuccessfully.
I found this thread
https://discuss.elastic.co/t/replicating-logstash-fingerprint-in-elasticsearch-dsl-py/270905
but I am not sure how to compose the string which is hashed when the following settings are made:

        concatenate_sources => true
        concatenate_all_fields => true

It would be great if you can help me out :slight_smile: Thank you in advance!

I found my error:
I misunderstood the

concatenate_all_fields => true

flag. Of course this parameter take into account not only the fields given in "source" but also the other fields including the hash which was calculated in the first run.

What I haven't figured out yet is how to compute the hash with a different method like e.g. with python. Especially nested fields give me a hard time.

But this is a different question and this one here can be marked as solved.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.