I'm wondering if I can get at log data in files that only have a time stamp (no date) for each line in the file. The file name contains the date that the file was created but only the day number. The file is overwritten every month on the day that it is written.
So my question is - is there a way I can combine the time stamp of the line in the file with the date stamp of the file I'm reading? I don't want to use the name of the file because it only contains the day number, not the month and not the year.
Is there a way of getting at the properties (over and above the path and name) of the file being read so that I can use a "date created" or "date written" value to build my @timestamp field?
You can use the ruby filter for this. Obtain the path to the logfile via the path field (i.e. event['path'] in Ruby) and stat the file. The date filter requires all timestamp information to be read from a single field so you'll probably have to do some string concatenation chores before you have something that's palatable to the date filter.
Magnus - many thanks for the steer on this - I'm new to ELK and on a steep learning curve! Can I do all that from within the conf file without any additional scripting?
Andy I think you can easily do without coding in ruby, see below
e.g. file path (has the month)
/var/log/sample_logs/11/app.log
sample event in file (has the day and hours/min/sec)
17 11:14:22 MESSAGE_HERE
filter {
grok {
#take day and timestamp inside message e.g. 28 14:12:32, save it in "timestamp_tmp" field
match => [ "message", "^(?<timestamp_tmp>\d{2} \d{2}\:\d{2}\:\d{2})"]
}
grok {
#month is taken from file path itself e.g. /var/log/05/sample_logs/app.log, save it in "month" field
match => [ "path", "^\/var\/log\/sample_logs\/(?<month>\d{2})"]
}
mutate {
#combine them
add_field => {"timestamp" => "%{month} %{timestamp_tmp}"}
}
date {
#do the matching
match => [ "timestamp" , "MM dd HH:mm:ss"]
}
mutate {
#throw fields away
remove_field => ["timestamp_tmp","month"]
}
}
Yes, the only place I have the full date is in the file properties. The file name only has the day number in it and the line in the file only has the time.
To give you some context - I have a directory of sar files which I was hoping I could use as a source for logstash. Each file has all the sar data sampled at 10 minute intervals - sar01, sar02, sar03, etc for each day of the month. Each file is overwritten each month on it's respective day. The first part of each line is the time (but no date).
All the data I want is already in those files but proving tricky to get into logstash. Originally, I was thinking that it would be lighter on resources to get hold of that data from the pre-existing files but it's looking as though I might end up running my own sar processes from logstash (using pipe or exec?) or using a separate scripts to generate my own logs...
After looking into this further (and learning a little more logstash and sysstat) I have found that I can run a "sadf" command against the current sa file and it will format its output in a number of ways - one of which will give me the data I want with a date/time stamp on each line.
So I think I have a number of options:
Use the "pipe" input to run the sadf command, tail the output, and feed it into ES.
Use the "pipe" input to run a command that redirects the output of the sadf to a file and have a second "file" input that watches the file.
Use the "exec" input instead of the "pipe" input which I can run at 10 minute intervals (still using the "file" input to watch the file.
I think the latter will have the least impact on resources. I was just wondering if anyone had any advice on what might be the best option?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.