Logstash Fingerprint Plugin: How to Exclude @timestamp


(Shreyas Karnik) #1

I am using the fingerprint plugin with the concatenate_all_fields option to generate a UUID for each document. I got into a situation where the content of my message does not change but since @timestamp is integral part of the message it creates a UUID and kind of defeats the purpose of the UUID.

I was wondering what would be a good way to work around this situation and exclude @timestamp for the fingerprint step, any help or tips would be greatly appreciated.


#2

What are you setting method to? If UUID, then it is generating a unique UUID every time by design.

Have you looked at using a uuid filter instead?


(Shreyas Karnik) #3

Thanks for the response.

key => "xxxxxxxxxxxx"
method => "SHA256"
concatenate_all_fields => true

I am using the fingerprint field as _id in ES to eliminate duplicates, I did think about using uuid filter but it generates just an uuid regardless of the content and I wanted to have something in place that is content driven. Fingerprint plugin solves everything if only there is a way to drop @timestamp


#4

I don't know what version you are running, in 6.0 it would look like this:

fingerprint {
    key => "abc123"
    concatenate_sources => "true"
    method => "SHA256"
    source => [ "foo", "bar" ]
}
and then you do get the same fingerprint for two records if those two fields match. I wonder if concatenate_all_fields was replaced to resolve exactly this issue.

#5

@shreyask OK, so there is an error in the 6.0 documentation so that the concatenate_all_fields option is named as a duplicate of concatenate_source.

Yes, if you use concatenate_all_fields it concatenates all fields including the timestamp. A search in the discussions suggests the way to fix this is to use a ruby filter to do the field concatenation (excluding timestamp) and add that as a field, then fingerprint with that added field as the source.


(Shreyas Karnik) #6

I had submitted a PR to fix the documentation issue but it seems like the documentation was not updated on the website. I will take the filter route as it seems a good way to workaround this issue.

Thanks @Badger for the help. Can you point to the specific post regarding the ruby filter?


#7

The suggestion to use a filter was here. This post solves a different problem, but shows you how to get the event data and to add a field. Now you just need add code to iterate over the fields.


(Shreyas Karnik) #8

Thanks a lot! @Badger


#9

@shreyask I wanted to know the answer, so I experimented till I got this working. My apologies if my ruby coding style makes your eyeballs bleed. Not sure if you will also want to filter out host and @version.

filter {
  ruby {
    code => "
      s = ''
      h = event.to_hash
      h.each { |k, v|
        if k != '@timestamp'
          s += ',' + k.to_s + ':' + v.to_s
        end
      }
      event.set('some-field-name', s)
    "
  }
  fingerprint {
    key => "abc123"
    source => "some-field-name"
    method => "SHA256"
  }
}

{
                "bar" => 1234,
         "@timestamp" => 2017-12-20T13:29:37.110Z,
                "foo" => "0f4c2678",
           "@version" => "1",
               "host" => "[...]",
        "fingerprint" => "732e31008aa3d3fdfda1d1160994c561ef5022e6e5183598c90b4e69f76c2db9",
    "some-field-name" => ",@version:1,host:[...],bar:1234,foo:0f4c2678"
}
{
                "bar" => 1234,
         "@timestamp" => 2017-12-20T13:29:37.150Z,
                "foo" => "0f4c2678",
           "@version" => "1",
               "host" => "[...]",
        "fingerprint" => "732e31008aa3d3fdfda1d1160994c561ef5022e6e5183598c90b4e69f76c2db9",
    "some-field-name" => ",@version:1,host:[...],bar:1234,foo:0f4c2678"
}

Read filenames using logstash
(Shreyas Karnik) #10

Thanks a lot @Badger this works perfectly!


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.