Stated Working with ELK some days back. A newbie, please help.
I have a log (sample below)
Timestamp temperature 20
Timestamp temperature 20
Timestamp temperature 21
Note that timestamps are all different down to milliseconds. So each time a new message is parsed by grok.
The problem I have is that I want logstash/from to match only the first occurrence of "temperature 20" and "temperature 21" and drop/ignore the messages which have repeatedly same values.
Since each of these events is indeed a separate event it may not be possible to drop what is essentially a repeat event. There does not seem to be any way to store a "memory" of an event, in this case the temperature, to compare to future events. I would be happy to be wrong about this, of course!
If that is really what you want (one document for each value of temperature) then use the temperature as the document_id (assuming you are sending data to elasticsearch). If you are not using elasticsearch you could do it in a ruby filter using something like
ruby {
init => '@seenValues = {}'
code => '
value = event.get("someField")
if @seenValues.include? (value)
event.cancel
end
@seenValues[value] = 1
'
}
If you only want events where that field changes then it can also be done using ruby. Something like
ruby {
init => '@lastValue = ""'
code => '
value = event.get("someField")
if value == @lastValue
event.cancel
end
@lastValue = value
'
}
In either case you will need pipeline.workers set to 1 and pipeline.ordered set to auto (the default in v7.x).
Many thanks for your reply. Really appreciate it. Your proposed solution works. But it doesn't takes the same value at different timestamp. please see below sample for your reference
Wed Mar 17 10:01:35 temperature 20.1 <----value1 at this timestamp
Wed Mar 17 10:02:36 temperature 20.1
Wed Mar 17 10:03:37 temperature 20.1
Wed Mar 17 10:04:38 temperature 20.3 <----value2 at this timestamp
Wed Mar 17 10:05:39 temperature 20.3
Wed Mar 17 10:06:40 temperature 20.1 <----value1 repeats again but at different timestamp
Wed Mar 17 10:07:41 temperature 20.1
Wed Mar 17 10:08:42 temperature 20.1
Wed Mar 17 10:09:43 temperature 20.1
Wed Mar 17 10:10:44 temperature 20.3 <----value2 repeats again but at different timestamp
Based on this,
If the timestamp is changed and the value remains same, then only FIRST OCCURENCE of this change should be parsed.
If timestamp is changed and the value also changed, then only FIRST OCCURENCE of this change should be parsed.
This code by you
ruby {
init => '@lastValue = ""'
code => '
value = event.get("someField")
if value == @lastValue
event.cancel
end
@lastValue = value
'
}
works but doesn't takes changed timestamp and add the message to logstash output.
I tried your code with pipeline worders 1 and it worked like CHARM!!
ruby {
init => '@lastValue = ""'
code => '
value = event.get("someField")
if value == @lastValue
event.cancel
end
@lastValue = value
'
}
Tested it with all the logs. No issues and can get the desired output. Will ping you if anything out of ordinary does pop-up.
One more question to end this issue, can i set pipeline.workers = 1 inside logstash ( or next to grok )? is this something doable? If not, i will have to define it in the pipelines.yml file i assume. If so, what will be the processing speed for logs. FYI, i have 8 cores in server for this.
i have to calculate time difference between these two events as well, should i continue here or open a new topic?
Was looking into the forum for time difference between two timestamps and found one post in which you posted a solution, but that isn't working for me.
For reference, the same sample log ( as posted above ) , i need to get the total time in minutes and hours between two different readings. For example, how much time did the device stayed at 20.1 and 20.3 and so on.
I am following the above post code for calculating the duration of temp for each repeating value.
There are different devices outputting temp values and i am sorting these devices as dev1 , dev ...dev24. The ruby script is joining the temp values duration from last file from dev1 to first file of dev2.
I want the durations to be dev specific and not become an overall duration of values.
This would mean that all the durations of dev1 temp files should be separate from durations of dev2 temp files.
ruby {
init => '@lastValue = nil'
code => '
now = event.get("@timestamp").to_f
if @lastValue == nil
@lastTime = now
end
value = event.get("[@metadata][restOfLine]")
if value == @lastValue
event.cancel
else
delta = now - @lastTime
event.set("delta", delta)
@lastTime = now
end
@lastValue = value
'
}
This code is having a small issue. The VALUE2 has delta of VALUE1in output. See below;
No, you cannot expect event processing software to predict what future events will arrive.
If you need to do that using logstash then you can write a ruby filter that basically stores each event, and when the next event arrives calculates the delta.
I doubt Elastic are aiming to help with that use case. You can make logstash do it. You can make logstash ingest a C++ source file and output an executable binary. But that does not make it a good idea.
I don't mean to forsee any future events. My aim is that when line 1 starts with temp1 : 29.15, i want to know the duration of it untill the temp1 value changes to 19.05 so that i can understand that dev2 stayed in temp value of 29.15 for 497.06 seconds.
And for a newbie like me, can you guide where i can refer to ruby filter coding please? i checked docs and its not as elaborate as your codes. Ruby related to logstash or some sort of it.
@Badger , there’s a similar issue on this post which is solved. Guess this can be tweaked to get time difference between each line of. I can later use metrics in ES/KIBANA to get sum of all durations.
That is very similar to the code I posted above. There is no need to use a class variable (@@). An instance variable (@) will do.
If you want to attach 497.06 to line2 then that is trivial. If you want to attach it to line1 then you need to save line1 somewhere until line2 arrives. I would probably use an aggregate filter so that if line2 does not arrive I can timeout line1. Otherwise do it in a ruby filter like above.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.