Logstash/grok to match only first occurrence and stop parsing repeatedly for same values

Hi @Badger

Many thanks for your reply. Really appreciate it. Your proposed solution works. But it doesn't takes the same value at different timestamp. please see below sample for your reference

Wed Mar 17 10:01:35 temperature 20.1 <----value1 at this timestamp
Wed Mar 17 10:02:36 temperature 20.1
Wed Mar 17 10:03:37 temperature 20.1
Wed Mar 17 10:04:38 temperature 20.3 <----value2 at this timestamp
Wed Mar 17 10:05:39 temperature 20.3
Wed Mar 17 10:06:40 temperature 20.1 <----value1 repeats again but at different timestamp
Wed Mar 17 10:07:41 temperature 20.1
Wed Mar 17 10:08:42 temperature 20.1
Wed Mar 17 10:09:43 temperature 20.1
Wed Mar 17 10:10:44 temperature 20.3 <----value2 repeats again but at different timestamp

Based on this,

  1. If the timestamp is changed and the value remains same, then only FIRST OCCURENCE of this change should be parsed.
  2. If timestamp is changed and the value also changed, then only FIRST OCCURENCE of this change should be parsed.

This code by you

ruby {
    init => '@lastValue = ""'
    code => '
        value = event.get("someField")
        if value == @lastValue
            event.cancel
        end
        @lastValue = value
    '
}

works but doesn't takes changed timestamp and add the message to logstash output.

Please advise.
Regards.

Hi @Badger

I tried your code with pipeline worders 1 and it worked like CHARM!!

ruby {
    init => '@lastValue = ""'
    code => '
        value = event.get("someField")
        if value == @lastValue
            event.cancel
        end
        @lastValue = value
    '
}

Tested it with all the logs. No issues and can get the desired output. Will ping you if anything out of ordinary does pop-up.

One more question to end this issue, can i set pipeline.workers = 1 inside logstash ( or next to grok )? is this something doable? If not, i will have to define it in the pipelines.yml file i assume. If so, what will be the processing speed for logs. FYI, i have 8 cores in server for this.

Please let me know.

I am assuming that you parse this so that [someField] contains "temperature 20.1". For example

dissect { mapping => { "message" => "%{} %{} %{} %{} %{someField" } }

No, i am using as below,

grok{
match => { "message" => "%{DAY} %{MONTH} %{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND} %{DATA:temp} %{BASE10NUM:temp}" }
}

OK, so [temp] will be an array with two entries. If you just want to test the number then use

value = event.get("[temp][1]")

If you want to test everything after the timestamp then use

value = event.get("[temp]").join(" ")
1 Like

That's helpful.

i have to calculate time difference between these two events as well, should i continue here or open a new topic?

Was looking into the forum for time difference between two timestamps and found one post in which you posted a solution, but that isn't working for me.

For reference, the same sample log ( as posted above ) , i need to get the total time in minutes and hours between two different readings. For example, how much time did the device stayed at 20.1 and 20.3 and so on.

please advice,

What I would do is

    grok {
        pattern_definitions => { "CUSTOMTIME" => "%{DAY} %{MONTH} %{MONTHDAY} %{TIME}" }
        match => { "message" => "%{CUSTOMTIME:[@metadata][timestamp]} %{GREEDYDATA:[@metadata][restOfLine]}" }
    }
    date { match => [ "[@metadata][timestamp]", "EEE MMM dd HH:mm:ss" ] }
    ruby {
        init => '@lastValue = nil'
        code => '
            now = event.get("@timestamp").to_f
            if @lastValue == nil
                @lastTime = now
            end

            value = event.get("[@metadata][restOfLine]")
            if value == @lastValue
                event.cancel
            else
                delta = now - @lastTime
                event.set("delta", delta)
                @lastTime = now
            end
            @lastValue = value
        '
    }
1 Like

Thank you @Badger

Hi @Badger

I am following the above post code for calculating the duration of temp for each repeating value.

There are different devices outputting temp values and i am sorting these devices as dev1 , dev ...dev24. The ruby script is joining the temp values duration from last file from dev1 to first file of dev2.

I want the durations to be dev specific and not become an overall duration of values.
This would mean that all the durations of dev1 temp files should be separate from durations of dev2 temp files.

Please advice.

Then you might be better off using an aggregate filter, with the device as the key, and use the map instead of the @instanceVariables.

@Badger

ruby {
        init => '@lastValue = nil'
        code => '
            now = event.get("@timestamp").to_f
            if @lastValue == nil
                @lastTime = now
            end

            value = event.get("[@metadata][restOfLine]")
            if value == @lastValue
                event.cancel
            else
                delta = now - @lastTime
                event.set("delta", delta)
                @lastTime = now
            end
            @lastValue = value
        '
    }

This code is having a small issue. The VALUE2 has delta of VALUE1in output. See below;

{
    "@timestamp" => 2021-06-02T19:25:48.783Z,
          "path" => "/home/temp.log.1",
          "time" => "0.7349998950958252",
         "delta" => 0.7349998950958252,
         "temp1" => 29.15,
          "host" => "dev2"
}
{
    "@timestamp" => 2021-06-02T19:34:05.844Z,
          "path" => "/home/temp.log.1",
          "time" => "497.0610001087189",
         "delta" => 497.0610001087189,  <--- this should be in above "delta" 
         "host" => "dev2",
         "temp1" => 19.05,

}

I was going thoroughly thorough the log and i found. the temp1 : 29.15 value was persistant untill temp1 : 19.05 was logged.

No, you cannot expect event processing software to predict what future events will arrive.

If you need to do that using logstash then you can write a ruby filter that basically stores each event, and when the next event arrives calculates the delta.

I doubt Elastic are aiming to help with that use case. You can make logstash do it. You can make logstash ingest a C++ source file and output an executable binary. But that does not make it a good idea.

2021-06-02 19:25:48.783 dev2 temp1 29.15 --line 1
2021-06-02 19:34:05.844 dev2 temp1 19.05 --line 2

i would try to write a ruby filter for it.

I don't mean to forsee any future events. My aim is that when line 1 starts with temp1 : 29.15, i want to know the duration of it untill the temp1 value changes to 19.05 so that i can understand that dev2 stayed in temp value of 29.15 for 497.06 seconds.

And for a newbie like me, can you guide where i can refer to ruby filter coding please? i checked docs and its not as elaborate as your codes. Ruby related to logstash or some sort of it.

Thanks for your patience!

@Badger , there’s a similar issue on this post which is solved. Guess this can be tweaked to get time difference between each line of. I can later use metrics in ES/KIBANA to get sum of all durations.

Let me know your thoughts on it.

That is very similar to the code I posted above. There is no need to use a class variable (@@). An instance variable (@) will do.

If you want to attach 497.06 to line2 then that is trivial. If you want to attach it to line1 then you need to save line1 somewhere until line2 arrives. I would probably use an aggregate filter so that if line2 does not arrive I can timeout line1. Otherwise do it in a ruby filter like above.

if [temp1]
{
  aggregate {
    task_id => "%{temp1}"
    code => "map['duration'] ||= 0; map['duration'] += 1;"
    push_map_as_event_on_timeout => true
    timeout_task_id_field => "temp1"
    timeout => 60 # 60 seconds timeout
    timeout_tags => ['_aggregatetimeout']
    timeout_code => "event.set('durationinvale', event.get('duration') > 0)"
  }
}

I am following the example and doesn't works. Can you advise what i am doing wrong?

stdout is

{
    "@timestamp" => 2021-06-02T19:25:48.048Z,
          "path" => "/home/temp.log.1",
          "time" => "12.713000059127808",
         "delta" => 12.713000059127808,
         "temp1" => 29.15,
          "host" => "dev2"
}
{
    "@timestamp" => 2021-06-02T19:34:05.844Z,
          "path" => "/home/temp.log.1",
          "time" => "497.7960000038147",
         "delta" => 497.7960000038147,  
         "host" => "dev2",
         "temp1" => 19.05,

}

Which version of logstash are you running?

7.8.0

Upgrade at least to 7.9.1, or disable the java execution engine.

Tried with the latest 7.13.1. Still same.