Fingerprint plugin not working properly

I have logs that I receive daily in a JSON form and most of them contain unique identifiers such as ip addresses and ports. A lot of them repeat daily so I want to filter them out but their timestamp changes and elastic creates a new id for every single one of them. I have installed and configured fingerprint plugin in logstash to create a unique fingerprint for every event based on that events ip address and port. For some reason logstash creates the same hash for every single event and not based on that events ip address and port.
I expect results such as:
192.168.0.1:80 = n3j1tjo31ifj0inv023n10f
192.168.0.2:443=31tboginv4go1ng4igno4

But I receive
192.168.0.1:80 = n3j1tjo31ifj0inv023n10f
192.168.0.2:443=n3j1tjo31ifj0inv023n10f

I will include the logstash.conf below. Thank you for any assistance in advance.

input {
   file {
   codec => "json"
   path => "/usr/share/shadow/one.json" 
   start_position => beginning 
   sincedb_path => "/dev/null"
   #sincedb_path => "/home/bitnami/sincedb/sincedb-access"
  }
}
filter {
 fingerprint{
                source => ["ip","port"]
                concatenate_sources => true
                target => "jedinstveni_id"
                method => "MD5"
                key => "randomkey"
                }
}
output {
  elasticsearch {
    hosts => ["127.0.0.1:9200"]
    data_stream => false
    #document_id => "%{logstash_checksum}"
    index => "shadow-server"
  }
  stdout {
    codec => rubydebug
  }
  file {
    path => "/usr/share/shadow/log/output.log"
  }
}

Welcome to the community!

Not sure how do you get the same hash. The hash should be doc_id. No need for a random key, since you want only deduplication.

This sample is working:

input {
  generator {
       message => '{"ip": "192.168.0.1","port":"80"}
	   {"ip": "192.168.0.1", "port":"43"}'
	   count => 1
	   codec => json_lines
  }
}

filter {

   fingerprint{
                source => ["ip","port"]
                concatenate_sources => true
                target => "[@metadata][jedinstveni_id]"
                method => "MD5"
                #key => "randomkey"
   }
  
   mutate { remove_field => ["host", "event", "@timestamp", "@version"] } 

}

output {
  stdout { codec => rubydebug{ metadata => true}}
  elasticsearch {
    hosts => ["127.0.0.1:9200"]
    data_stream => false
    document_id => "%{[@metadata][jedinstveni_id]}"
    index => "shadow-server"
  }
}

Result:

{
         "port" => "80",
    "@metadata" => {
        "jedinstveni_id" => "4f1bead294d01181b6c40ac78deb5381"
    },
           "ip" => "192.168.0.1"
}
{
         "port" => "43",
    "@metadata" => {
        "jedinstveni_id" => "ef7a9233096432e9a67f72931aac7155"
    },
           "ip" => "192.168.0.1"
}
1 Like

Thank you for your assistance but the config is still giving me the same hash values across the events. Do you think the JSON file I'm reading from could be at fault?

Might be, you should review with your data, just put in the generator.

1 Like

It works with the generator, thank you! So my issue is probably with the json file or the way logstash interprets it.

I don't know is it a single or a multiline JSON. If is the multiline file- the human readable with spaces&new lines, cannot be converted in the regular JSON codec.