Replicating Logstash Fingerprint in elasticsearch_dsl_py

I am trying replicate this piece of code in the python.

ruby {
    code => '
        physical = [
        event.get("address1").to_s,
        event.get("address2").to_s,
        event.get("city").to_s,
        event.get("zip_code").to_s,
        ].join(" ")
        event.set("whole_address", physical.gsub(/[^0-9a-z ]/i, "").squeeze(" "))
        event.set("fingerprint", physical.gsub(/[^0-9a-z ]/i, "").squeeze(" ").upcase)
    '
}

fingerprint {
    key => "1234ABCD"
    method => "SHA256"
    source => ["fingerprint"]
    target => "[@metadata][generated_id]"
}

Using the hashlib module I think it is

Python Docs

import hashlib
def add_physical(address1, address2, city, state, zip_code, county):
    p = Place()
    p.address1 = address1
    p.address2 = address2
    p.city = city
    p.state = state
    p.zip_code = zip_code
    p.county = county
    p.whole_address = whole_address(address1, address2, city, zip_code)
    p.type = 'Physical'
    p.geostatus = 'Need'
    p.location = {
        "lon": 0,
        "lat": 0
    }
    key = '1234ABCD'
    fingerprint = whole_address(address1, address2, city, zip_code).upper()
    p.meta.id = hashlib.sha256(fingerprint.encode() + key.encode()).hexdigest()
    p.save()
    Place._index.refresh()
    return p.meta.id

But it is not returning the right hash to match the one generated by logstash.

Logstash Fingerprint Filter

Any ideas how to make the two match? It helps to prevent duplication.

You need to use the hmac lib in python to create a HMAC hash, as the fingerprint in logstash also creates a HMAC hash.

See this example

>>> import hashlib
>>> import hmac
>>> salt = '1234ABCD'
>>> message = 'create fingerprint in python'
>>> fingerprint = hmac.new(bytes(salt , 'utf-8'), msg = bytes(message , 'utf-8'), digestmod = hashlib.sha256).hexdigest()
>>> fingerprint
'1ded96f67ddbaa8c586994557d05f1765bb02bb53edaab976cd60402bead8d0d'
>>> 

Using the same example in logstash

fingerprint {
    key => "1234ABCD"
    method => "SHA256"
    source => ["message"]
}

The output will be something like this:

     "@timestamp" => 2021-04-21T22:07:06.991Z,
           "host" => "elk",
        "message" => "create fingerprint in python",
    "fingerprint" => "1ded96f67ddbaa8c586994557d05f1765bb02bb53edaab976cd60402bead8d0d",
       "@version" => "1"
}

As you can see the fingerprint from logstash and python are the same.

logstash: 1ded96f67ddbaa8c586994557d05f1765bb02bb53edaab976cd60402bead8d0d
python: 1ded96f67ddbaa8c586994557d05f1765bb02bb53edaab976cd60402bead8d0d

Why use a key?

Now I see the line in the filter

    OpenSSL::HMAC.hexdigest(digest, @key, data.to_s).force_encoding(Encoding::UTF_8)

Still not matching. Probably due to how I am formatting the fingerprint. But this is a great start.

I was just copying the code given in the blog.

Remove Duplication

That would explain it. That blog is a follow up to another blog, which was published back when using a key (and therefore a MAC rather than a hash) was mandatory. The option to use a simple hash was added in 2018.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.