Getting date stamp from file properties

Hi,

I'm wondering if I can get at log data in files that only have a time stamp (no date) for each line in the file. The file name contains the date that the file was created but only the day number. The file is overwritten every month on the day that it is written.

So my question is - is there a way I can combine the time stamp of the line in the file with the date stamp of the file I'm reading? I don't want to use the name of the file because it only contains the day number, not the month and not the year.

Is there a way of getting at the properties (over and above the path and name) of the file being read so that I can use a "date created" or "date written" value to build my @timestamp field?

Any tips gratefully received.

Cheers,
Andy

You can use the ruby filter for this. Obtain the path to the logfile via the path field (i.e. event['path'] in Ruby) and stat the file. The date filter requires all timestamp information to be read from a single field so you'll probably have to do some string concatenation chores before you have something that's palatable to the date filter.

1 Like

Magnus - many thanks for the steer on this - I'm new to ELK and on a steep learning curve! Can I do all that from within the conf file without any additional scripting?

Yes. The ruby filter allows enables custom message processing with Ruby snippets inside the configuration file.

Andy I think you can easily do without coding in ruby, see below

e.g. file path (has the month)
/var/log/sample_logs/11/app.log

sample event in file (has the day and hours/min/sec)
17 11:14:22 MESSAGE_HERE

filter {
        grok {
                #take day and timestamp inside message e.g. 28 14:12:32, save it in "timestamp_tmp" field
                match => [ "message", "^(?<timestamp_tmp>\d{2} \d{2}\:\d{2}\:\d{2})"]
        }
        grok {
                #month is taken from file path itself e.g. /var/log/05/sample_logs/app.log, save it in "month" field
                match => [ "path", "^\/var\/log\/sample_logs\/(?<month>\d{2})"]
        }
        mutate {
            #combine them
            add_field => {"timestamp" => "%{month} %{timestamp_tmp}"}
        }
        date {
                       #do the matching
                        match => [ "timestamp" , "MM dd HH:mm:ss"]
                }
        mutate {
            #throw fields  away
            remove_field => ["timestamp_tmp","month"]
        }
}

Great - I shall have a look. Thanks again.

@nellicus wrote:

Andy I think you can easily do without coding in ruby, see below

e.g. file path (has the month)
/var/log/sample_logs/11/app.log

Okay, but in the original question @Andy said:

The file name contains the date that the file was created but only the day number. The file is overwritten every month on the day that it is written.

So unless we're going to guess at what month we're dealing with I don't see how the log message and the file path can give us a complete date.

Yes, the only place I have the full date is in the file properties. The file name only has the day number in it and the line in the file only has the time.

To give you some context - I have a directory of sar files which I was hoping I could use as a source for logstash. Each file has all the sar data sampled at 10 minute intervals - sar01, sar02, sar03, etc for each day of the month. Each file is overwritten each month on it's respective day. The first part of each line is the time (but no date).

All the data I want is already in those files but proving tricky to get into logstash. Originally, I was thinking that it would be lighter on resources to get hold of that data from the pre-existing files but it's looking as though I might end up running my own sar processes from logstash (using pipe or exec?) or using a separate scripts to generate my own logs...

oh right - I must have misread that. thought all the info were spread between file path and events inside the file.

After looking into this further (and learning a little more logstash and sysstat) I have found that I can run a "sadf" command against the current sa file and it will format its output in a number of ways - one of which will give me the data I want with a date/time stamp on each line.

So I think I have a number of options:
Use the "pipe" input to run the sadf command, tail the output, and feed it into ES.
Use the "pipe" input to run a command that redirects the output of the sadf to a file and have a second "file" input that watches the file.
Use the "exec" input instead of the "pipe" input which I can run at 10 minute intervals (still using the "file" input to watch the file.

I think the latter will have the least impact on resources. I was just wondering if anyone had any advice on what might be the best option?

Thanks,
Andy