Hello everyone,
I have a hard time to understand the behaviour fo the logstash fingerprint filter if a field is empty or non-existent.
My scenario is the following:
I'm reading a json-file with logstash and create a document for each json object in an elasticsearch index.
I'm calculating an MD5 fingerprint from some of the key-value-pairs (including nested objects) composing the json object.
{
"key_a": "value_a",
"key_b": "value_b",
"key_c": "value_c",
"key_d": {
"key_e": "value_e",
"key_f": "value_f"
}
}
Fingerprint filter:
filter {
fingerprint {
source => ["key_a", "key_b", "key_d"]
concatenate_sources => true
concatenate_all_fields => true
target => "doc_fingerprint"
method => "MD5"
key => "integrity"
}
}
As you can see from the example above, I am not using all key-value-pairs of the json object.
Moreover not all keys are present in every json object. For example in some instances the "key_b" is missing. In other instances "key_a" can have no value: "" or nil
Now to the actual issue:
I have a second logstash-file which I use to read all the documents from the previously filled elasticsearch index and to calculate the fingerprint again. The filter remains unchanged (except for the target) since the field name in the documents and the keys in the json object are identical.
filter {
fingerprint {
source => ["key_a", "key_b", "key_d"]
concatenate_sources => true
concatenate_all_fields => true
target => "second_doc_fingerprint"
method => "MD5"
key => "integrity"
}
However, this condition following the fingerprint filter is always false:
if second_doc_fingerprint != doc_fingerprint {
ruby { code => 'puts "unequal"' }
}
}
I'm trying to find a debug interface to understand what's happening here.
Is it possible that the empty or non-exisiting fields are related to the issue?
I tried to recalculate the fingerprint with a python script unsuccessfully.
I found this thread
https://discuss.elastic.co/t/replicating-logstash-fingerprint-in-elasticsearch-dsl-py/270905
but I am not sure how to compose the string which is hashed when the following settings are made:
concatenate_sources => true
concatenate_all_fields => true
It would be great if you can help me out Thank you in advance!