Need help in using logstash with grok and fingerprint

Hi,

I'm new to logstash and i have the below requirement:

My logs look like below and i dont want to send all 3 lines to Kibana, instead want to send only one line for a given time range.

[2019-09-25 17:17:33.153] [logger] [info] normal_task: Output queue occupancy: 0.00.
[2019-09-25 17:17:34.154] [logger] [info] normal_task: Output queue occupancy: 0.00.
[2019-09-25 17:17:35.154] [logger] [info] normal_task: Output queue occupancy: 0.00.
[2019-09-25 17:17:36.155] [logger] [info] normal_task: Output queue occupancy: 0.00.
[2019-09-25 17:17:37.155] [logger] [info] normal_task: Output queue occupancy: 0.00.

I tried a combination of grok and fingerprint, it didnt worked as expected. Can i know how i can accomplish the above?

filter {
  grok{
    match => { "message" => "\[%{TIMESTAMP_ISO8601:time}\] %{DATA:log_message}"}
  }
  fingerprint {
    source => "message"
    target => "[@metadata][fingerprint]"
    method => "MURMUR3"
  }
}

The message field contains a varying timestamp so including this will not work. You will need to calculate the fingerprint based on a set of specific fields that defines what is unique. You should also not use a MURMUR3 hash as you will risk losing data as it is only 32-bit. I would recommend reading these blog posts:

What if i include log_message in fingerprint to generate the hash?

fingerprint {
    source => "log_message"
    target => "[@metadata][fingerprint]"
    method => "MURMUR3"
  }

That might work, but I would still recommend not using MURMUR3 hash.

sure, i'll use a different hash like SHA256..

I'm using this now,

input {
  stdin {
  }
}

filter {
  grok{
    match => { "message" => "\[%{TIMESTAMP_ISO8601:time}\] %{DATA:log_message}"}
  }
  fingerprint {
    source => ['log_message']
    target => "[@metadata][fingerprint]"
    method => "SHA256"
  }
}

output {
  stdout {
    codec  => rubydebug {
      metadata => true
    }
  }
}

But when i test it i'm not getting expected results. I'm trying to compare the fingerprint value generated for two same messages with different timestamps, but here the debug mode doesnt work

[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%.
{
    "@timestamp" => 2019-10-02T21:45:00.706Z,
          "host" => "host",
      "@version" => "1",
          "time" => "2019-10-02 20:53:17.749",
       "message" => "[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%."
}
[2019-10-02 20:53:07.749] [logger] [info] harvester: Output queue occupancy: 0.00%.
{
    "@timestamp" => 2019-10-02T21:45:16.942Z,
          "host" => "host",
      "@version" => "1",
          "time" => "2019-10-02 20:53:07.749",
       "message" => "[2019-10-02 20:53:07.749] [logger] [info] harvester: Output queue occupancy: 0.00%."
}

Where as when i set source to message instead of log_message, the output is in debug mode and i can see the fingerprint hash.

[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%.
{
    "@timestamp" => 2019-10-02T21:40:12.891Z,
     "@metadata" => {
        "fingerprint" => "7a289d620e1128f8d19e1976744efa090921b1ad7edf42a004e5323ec40c5ce3"
    },
      "@version" => "1",
          "host" => "host",
          "time" => "2019-10-02 20:53:17.749",
       "message" => "[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%."
}
[2019-10-02 20:53:07.749] [logger] [info] harvester: Output queue occupancy: 0.00%.
{
    "@timestamp" => 2019-10-02T21:40:30.346Z,
     "@metadata" => {
        "fingerprint" => "440b70520e778dfda160f3258fd4ba5b6efbcd221478e850ff8dbc70b41aa5c7"
    },
      "@version" => "1",
          "host" => "host",
          "time" => "2019-10-02 20:53:07.749",
       "message" => "[2019-10-02 20:53:07.749] [logger] [info] harvester: Output queue occupancy: 0.00%."
}

Any help regarding this will be helpful, also pls point to any examples that i can refer.
my logs look like the below:

[2019-10-02 21:17:00.742] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:17:10.742] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:17:20.742] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:17:30.743] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:17:40.743] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:17:50.743] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:18:00.743] [logger] [info] harvester: Output queue occupancy: 0.00%.
[2019-10-02 21:18:10.743] [logger] [info] harvester: Output queue occupancy: 0.00%.

My requirement is to send one log message/minute to Kibana rather than sending all 6 messages...

I would expect to see a log_message field in the output, which seems to be missing. Is this the full config? Can you change the DATA to a GREEDYDATA?

Yes thats the full config. Its working when i tried with GREEDYDATA

[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%.
{
     "@timestamp" => 2019-10-03T14:53:15.782Z,
      "@metadata" => {
        "fingerprint" => "6ba223861a09243d169c74c046ce239b9455687468a1dd8bb6ae76f2b4bcbaf6"
    },
       "@version" => "1",
           "host" => "host",
    "log_message" => "[logger] [info] harvester: Output queue occupancy: 0.00%.",
           "time" => "2019-10-02 20:53:17.749",
        "message" => "[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%."
}
[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.01%.    
{
     "@timestamp" => 2019-10-03T14:53:29.444Z,
      "@metadata" => {
        "fingerprint" => "c551cb6c4d5871768fc0ecb6d4e2e8461c5b9f9d6fc0de5a0c397c0f24454123"
    },
       "@version" => "1",
           "host" => "host",
    "log_message" => "[logger] [info] harvester: Output queue occupancy: 0.01%.",
           "time" => "2019-10-02 20:53:17.749",
        "message" => "[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.01%."
}
[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%.
{
     "@timestamp" => 2019-10-03T14:53:41.426Z,
      "@metadata" => {
        "fingerprint" => "6ba223861a09243d169c74c046ce239b9455687468a1dd8bb6ae76f2b4bcbaf6"
    },
       "@version" => "1",
           "host" => "host",
    "log_message" => "[logger] [info] harvester: Output queue occupancy: 0.00%.",
           "time" => "2019-10-02 20:53:17.749",
        "message" => "[2019-10-02 20:53:17.749] [logger] [info] harvester: Output queue occupancy: 0.00%."
}

Now the pending part is time constraint, i need to generate only one message for a give timestamp.. can i provide an timestamp as source for fingerprint: where i give the date with hours and minutes?

You could extract the relevant time stamp component and add these to the fields used by the fingerprint plugin.

I'm able to get what i want using the below config:

filter {
  dissect {
    mapping => { "message" => "[%{ts}-%{+ts}-%{+ts} %{+ts}:%{+ts}:%{seconds}] %{remaining}" }
  }
  fingerprint {
    source => ['remaining','ts']
    target => "[@metadata][fingerprint]"
    method => "SHA256"
  }
}
output {
  kafka {
     bootstrap_servers => "kafka broker host"
     topic_id => "kafka topic"
     id => "%{[@metadata][fingerprint]}"
     codec => plain {
       format => "%{message}"
     }
  }
}

but unfortunately the team who supports Elastic Search blocked users in setting up the _id , hence i cannot use the replace functionality. Is their any other way where i can do this?

if not ... then the issue i had is resolved, thanks for the help Christian !!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.