The SHA1 fingerprints generated by Logstash differ from those generated using the API

In our API, we are concatenating the order id and fingerprint to get the hashed value for orderKey.

Given the id/fingerprint of the items:
id: '7542b27c-c255-4b18-8321-06c9d65aca7f'
fingerprint: '28d68c4f04f487f514db3df32d36f70e31d27592'

The values concatenated together are '7542b27c-c255-4b18-8321-06c9d65aca7f28d68c4f04f487f514db3df32d36f70e31d27592', and our API is outputting '0c948678d0e2b308e178ea3a9e1e01c0f6ba79ef' for orderKey, but logstash output is 'bb5931eaed2186c3b4347401f0ae07e5fa98ed36'.

so i am just wondering why its different.

which library is being used for SHA1 in the logstash.

we are using this in our app to generating the orderKey Hashed value and can be verified
https://emn178.github.io/online-tools/sha1.html?input_type=utf-8&input=7542b27c-c255-[…]d68c4f04f487f514db3df32d36f70e31d27592&hmac_input_type=utf-8

Logstash logstash #fingreprint

I guess you have concatenated by setting: concatenate_sources => true. My value in that case is: "77f2c0192c145d75d5c37839b9ff69289ee8ec2f"

I have do a little bit testing. The code:

input {
  generator {
       message => '{"id": "7542b27c-c255-4b18-8321-06c9d65aca7f", "fingerprint":"28d68c4f04f487f514db3df32d36f70e31d27592"}'
	   count => 1
	   codec => json_lines
  }
}
filter {
   fingerprint{
       source => ["id", "fingerprint"]
       # source => ["fingerprint", "id"]
       concatenate_sources => true
       #concatenate_all_fields => true
       target => "hashconcat"
       method => "SHA1"
   }
    
   mutate { add_field => { "fieldsmerged" => "%{id}%{fingerprint}"} }

   fingerprint{
       source => ["fieldsmerged"]
       target => "hashmerged"
       method => "SHA1"
   }

   mutate { add_field => { "fieldhc" => "7542b27c-c255-4b18-8321-06c9d65aca7f28d68c4f04f487f514db3df32d36f70e31d27592"} }
   fingerprint{
       source => ["fieldhc"]
       target => "hashhc"
       method => "SHA1"
   }

   mutate { remove_field => ["host", "event", "@timestamp", "@version"] } 

}

output {
  stdout { codec => rubydebug{ }}
}

produces:

{
              "id" => "7542b27c-c255-4b18-8321-06c9d65aca7f"
     "fingerprint" => "28d68c4f04f487f514db3df32d36f70e31d27592",
      "hashconcat" => "77f2c0192c145d75d5c37839b9ff69289ee8ec2f",
    "fieldsmerged" => "7542b27c-c255-4b18-8321-06c9d65aca7f28d68c4f04f487f514db3df32d36f70e31d27592",
      "hashmerged" => "0c948678d0e2b308e178ea3a9e1e01c0f6ba79ef",
         "fieldhc" => "7542b27c-c255-4b18-8321-06c9d65aca7f28d68c4f04f487f514db3df32d36f70e31d27592",
          "hashhc" => "0c948678d0e2b308e178ea3a9e1e01c0f6ba79ef",
}

Edit: So, hashmerged and hashhc are the same as you expect to be. Not sure why is. I have seen you already opened a bug.

in logstash we are using

fingerprint {
     source => ["[id]","[fingerprint]"]
     target => "orderKey"
     method => "SHA1"
     concatenate_sources => true
   }```

Check this part of the documentation:

When set to true and method isn’t UUID or PUNCTUATION , the plugin concatenates the names and values of all fields given in the source option into one string (like the old checksum filter) before doing the fingerprint computation.

The names of the fields are also present in the string, you can check the code to see how it is done, this is an test example from the code.

      describe "with concatenate_sources" do
        let(:config) { super().merge("concatenate_sources" => true) }
        it "fingerprints the value of concatenated key/pairs" do
          # SHA1 of "|field1|test1|field2|test2|"
          expect(fingerprint).to eq("e3b6b71eedc656f1d29408264e8a75535db985cb")
        end
      end
    end

So, the string that logstash will create a fingerprint is not what you expect:

7542b27c-c255-4b18-8321-06c9d65aca7f28d68c4f04f487f514db3df32d36f70e31d27592

It will concatenate the fields name and the values into something like this:

|fingerprint|28d68c4f04f487f514db3df32d36f70e31d27592|id|7542b27c-c255-4b18-8321-06c9d65aca7f|

You can check this if you enable debug:

[2024-04-17T09:18:35,093][DEBUG][logstash.filters.fingerprint][main][b6379a23a6015de83c4bd38df4abd75eac5bb0dfaa3a23504ddf58dd7a082fce] String built {:to_checksum=>"|fingerprint|28d68c4f04f487f514db3df32d36f70e31d27592|id|7542b27c-c255-4b18-8321-06c9d65aca7f|"}

To get the same result as your library you needto concatenate the string before using the fingerprint filter.

2 Likes

Not sure what you mean here, the fingerprint field is created by this logic in your pipeline:

    if [items][selectedModifiersCombined] {
        fingerprint {
          source => ["[items][upc]","[items][selectedModifiersCombined]"]
          target => "fingerprint"
          method => "SHA1"
          concatenate_sources => true
        }
    } else {
        fingerprint {
          source => "[items][upc]"
          target => "fingerprint"
          method => "SHA1"
        }
    }
  }

None of the events you shared have the field [items][selectedModifiersCombined], so they will match the else condition in your pipeline and the fingerprint will be the same.

You need to share both documents so this can be replicated.

Thank Leandro it's more clear now.

Just to add, it's the same: source => ["[id]","[fingerprint]"] and source => [""[fingerprint]", [id]"], the fields will be first sorted then hashed.

Yes it will be same,

input{
  generator {
       message => '{"1":"d14fe46a0b775b470bb0479eba41345568f844dd","a": "31d63dcb-1af4-4736-9da1-fb8e321f2729", "c":"20240301-233715042-1830D3"}'
	   count => 1
	   codec => json_lines
  }
}

filter {

  mutate { remove_field => ["host", "event", "@timestamp", "@version"] } 
  
  mutate {
      add_field => { "abcmerged" => "%{a}%{1}%{c}"}
      add_field => { "bcamerged" => "%{1}%{c}%{a}"}
      add_field => { "bacmerged" => "%{1}%{a}%{c}"}
  }

  fingerprint {
     source => ["a","1","c"]
     target => "a1c"
     method => "SHA1"
     concatenate_sources => true
  }

  fingerprint {
     source => ["1","c","a"]
     target => "1ca"
     method => "SHA1"
     concatenate_sources => true
  }

  fingerprint {
     source => ["1","a","c"]
     target => "1ac"
     method => "SHA1"
     concatenate_sources => true
  }
  
  mutate { remove_field => ["a", "1", "c"] } 
  

}
output {
 stdout { codec => rubydebug{ }}
}

Its using sorting but on which basis its sorting? In my opinion on the basis of the array keys

I have other question of that is

 fingerprint {
     source => ["1","a","c"]
     target => "1ac"
     method => "SHA1"
     concatenate_sources => true
  }


or 

 fingerprint {
     source => ["[1]","[a]","[c]"]
     target => "1ac"
     method => "SHA1"
     concatenate_sources => true
  }

in this case how the SHA1 structure will for both ?