Using part of parsed date as selected output index

I need to select output index based on the parsed date instead of using current date %{+YYYY.MM.dd}, because the log file does not roll at midnight.

I have tried to research, but only solution I find is using ruby code:

input {   
    #...   
}
filter {
    #...
    ruby {
        code => '
            parsedLogTimeString = event.get("logTime")
            parsedLogTimeDate = DateTime.strptime(parsedLogTimeString,"%Y-%m-%d %H:%M:%S")
            event.set("[@metadata][indexDate]", parsedLogTimeDate.strftime("%Y.%m.%d"))
        '
    }
    #...
}
output {
    elasticsearch {
        #...
        index => "my-index-%{[@metadata][indexDate]}"
        #...
    }
}

Is this a proper solution? ( I think it works, just feels hacky. )

Maybe there is a way I can get part (YYYY.DD.MM) of my already converted date, and use it when selecting the output index, instead of using ruby code?

date {
    match => ["logTime", "YYYY-MMM-dd;HH:mm:ss.SSS", "ISO8601"]
    timezone => "Europe/Oslo"
    target => ["logTime"]
}

Can you provide a logTime sample?

Not sure what you mean. An example of a logTime could be: 2024-10-28 09:49:38,007.

I think my code/.conf works, it just feels very hacky.

I was hoping there was a solution like (pseudo) :

index => "my-index-%{logTime.sformat('"%Y.%m.%d')}"

Reusing the date and then not needing any ruby code.

Happy Birthday!

1 Like

Good, because that's exactly what logstash does by default. String interpolation for dates is based on the [@timestamp] field, not the current date.

2 Likes

I think the alternative would be even more hacky.

The string interpolation of the time in the index option always uses the value of @timestamp, if @timestamp is the same as logTime, then you may use it, but it will be the time converted back into UTC, which cannot be change.

If I undertand you want to create the indices based on the local time, which is ahead of UTC.

The ruby filter you did you work as it will create a field with your YYYY.MM.dd value, but if you don't want to use a ruby filter you would need to manipulate the date string of the logTime field before using the date filter on it.

The following filters would have the same effect of your ruby filter.

  mutate {
    add_field => {
      "[@metadata][logTime]" => "%{logTime}"
    }
  }
  mutate {
    gsub => ["[@metadata][logTime]","-","."]
    split => {"[@metadata][logTime]" => " "}
  }
  mutate {
    add_field => {
      "[@metadata][index]" => "%{[@metadata][logTime][0]}"
    }
  }

The [@metadata][index] will be the value YYYY.MM.dd present in the logTime string.

2 Likes

I didn't know that +{YYYY.MM.dd} was based on @timestamp I thought it just got todays date. I see a lot of other people change the @timestamp to the parsed date. But read somewhere when I started learning Logstash that it was bad practice. So we have both, and change the Timestamp field to logTime when creating Data Views in Kibana.

If I undertand you want to create the indices based on the local time, which is ahead of UTC.

That's not the main reason, we have a Logstash listening to a backup log server, the log files are updated every hour, and they don't daily roll at midnight. Also the backup job does not append new lines to the log files, it deletes the old file and makes a whole new file.

So let's say 00:35 the file is updated. Since it is past midnight, all the lines from 23:35-00:00 that is supposed to go into yesterdays index now is pushed into todays new index. But because our backup job overrides the whole file, Logstash thinks it is a new file, and reads the whole file again. Meaning all of yesterdays data is pushed into todays the new index.

We are trying to fix that with:

  • Using fingerprint filter plugin to sha245 hash the whole line, and use that as the document_id.
  • Using the parsed date to decide the output index.

Thank you, I will stick to my solution with ruby code, thanks.