Hello,
I have ELK stack installed and would like to store csv data there. The only problem I am facing is that my data doesn't contain date information (just time). Example message looks something like:
aaa;bbb;ccc;23:58:5;;;ddd;;;;;;
I would like to get time field, merge it with generated date and then convert to @timestamp. The problem is that data arrives to logstash with some delay (few minutes), so it could happen that log:
aaa;bbb;ccc;23:58:5;;;ddd;;;;;;
will be processed next day. Then I need to add right date of course.
My idea is as below (written in pseudocode):
if current_hour < event_hour { // i.e. 0 < 23
event_date = yesterday_date
} else {
event_date = today_date
}
So before you want to make a calculation you have the @timestamp that LS received the data in the input.
You should use the event timestamp because should you ever use a persistent queue (LS PQ or Kafka) the time of filter processing maybe a long while after the event receive time.
As you noted in the pseudocode, you will never receive an event from the future .
I am assuming you plan to use the ruby filter to do the date maths.
Here are some Ruby time maths for the ruby filter:
ingest_time = event.timestamp.time # get the event timestamp as a Ruby Time instance
midnight = ingest_time.to_date.to_time
# pluck the first 3 values from the Time as an array, zip in array of seconds multipliers and reduce the zip by multiplying
ingest_seconds = ingest_time.to_a[0,3].zip([1,60,3600]).reduce(0){|sum, (l,r)| sum + (l * r)}
# get the event time string, '23:57:55' use the field name that contains this value
event_time_string = event.get("[event_time]")
# split the string on ':', map the to_i method over the elements, reverse and zip reduce as above
event_seconds = event_time_string.split(':').map(&:to_i).reverse.zip([1,60,3600]).reduce(0){|sum, (l,r)| sum + (l * r)}
# set the delta from midnight
delta = event_seconds
# does the event look like its from the future
# make the delta the negative amount of seconds to add to from midnight
if event_seconds > ingest_seconds
delta = (-24 * 3600) + event_seconds
end
event_time = midnight + delta
# use the event.timestamp API directly.
event.timestamp = LogStash::Timestamp.new(event_time)
Make sure that all events are recorded in UTC timezone and Logstash is run an ENV where TZ=UTC. The code above is not Daylight saving switch over safe.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.