Input file log rotation

Hi,

I am new to this group. I have been playing around with Logstash for about a couple of weeks. I was surprised to see that Logstash does not provide the feature where in, it reads a log file once and forgets about it. But it seems to have a very strong affinity to the files it has read. I am pretty sure there has been a need where you get thousands of log files and you just read them and forget it.

Reading through the logs, I see many folks have asked the same question. I have not found a satisfactory answer. I have decided to do some log rotation myself outside of logstash but then logstash still remembers the offsets in memory and even if i rotate through to an older file, it reads from the offsets from where it left off last time. I could work around this by truncating the file before it gets written to and hoping that logstash notices the truncation first before any new content gets written to. Obviously, this is very racy. If there is not enough time between truncation and new content, it would read from where it left off.

Can someone tell me a reliable way to do this log rotation ? I have sincedb_path set to NULL. Any clues would be appreciated ?

thanks
mohan

I have decided to do some log rotation myself outside of logstash but then logstash still remembers the offsets in memory and even if i rotate through to an older file, it reads from the offsets from where it left off last time.

What, exactly, did you do? What does "rotate through to an older file" mean?

This is what I mean by log rotation:

-Maintain logfile.0, logfile.1.... logfile.9
-When data comes in for a source, copy it to logfile.0, logfile.1 etc.

If I now reach the end of the chain logfile.9, I want to go back to logfile.0. This would not work because logstash remembers the offsets. Hence, when you write to logfile.0, I truncate the next file logfile.1 i.e when logfile.9 is written to truncate logfile.0 hoping that by the time I get to logfile.0, it would notice that and reset its offset (I see debug logs indicate that the new offset is smaller than the previous offset and starts over). This seems to work except that it is not reliable. I have to put in a artificial sleep between truncate and writing the new content.

Does that make sense ?

thanks
mohan

You're still not being sufficiently exact in your expression, but it seems like you've implemented a kind of ring buffer where you write to logfile.0 until that file reaches a certain size, then you switch to writing to logfile.1, and so on. Or?

Logstash uses inodes to track file offsets. Prefer creating new files since that (except for inode reuse) guarantees that Logstash will pick it up as a new file with no old state. But yes, there's still a race if you have start_position => beginning", and with start_position => end you might risk reprocessing old files depending on the details of how you manage the files.

1 Like

Sorry for not being clear. Yes, it is like a ring buffer. In my case, I write to each file just once (for now). When I reach the end, I go back to the first file in the chain. When I go there, all I expect is to logstash to read from the beginning. I have start_position => beginning but that is only for a new file AFAICT.

I tried deleting the old files (this is what I tried first) but that did not work well. It remembers the name of the file too (besides the inode) and hence remembers the offsets. And there is always the inode reuse which makes it less reliable.

How do other people solve this problem ? I can't have my input file growing forever and hence need rotation. But then the rotation has this problem. Is it just me or how are other people solving this problem ?

-thanks
mohan

2 Likes

Log files are usually rotated via a rename operation, e.g. logfile.log is renamed to logfile.log.0 and logfile.log is recreated (and obviously, logfile.log.0 is renamed to logfile.log.1 and so on). I've never heard about Logstash having any problems forgetting about the old logfile.log once a new file is created, and the behavior you describe isn't consistent with my reading of the source code of the filewatch library that I read just a few days ago. Can you post a reproducible example and/or debug log snippets that exhibit the problem?

Both shrinking files and files whose path suddenly points to another inode result in the same thing, namely that the original file and its state is forgotten and that the file is picked up as a new file.

I will work on getting some logs this afternoon. I have a question about the renaming operation. How do you know when to rename the logfile ? Logstash could be reading that file while you are renaming it ?

thanks
mohan

Renaming a file that's open is perfectly fine. Logstash can keep on reading that file for as long as it likes. Unfortunately it won't since it'll discover that the file has been rotated and close it.

I hope someone from the Logstash team can step in here. I'm not sure I have the full picture. I hope I don't, because there seems to be races and other bugs everywhere.

While collecting the logs, I found the problem. This is my input:

input {
file {
path => "/tmp/logstash/log_xxx*"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}

In my test case, the file is always deleted before writing. It works initially where you can see the files being deleted and logstash sees them and hence sincedb is null, it resets the offset to zero. When it does not work, I see this log:

/tmp/logtrove/log_xxx.7: old inode was [1073973, 0, 64768], new is [1073879, 0, 64768] {:level=>:debug, :file=>"filewatch/watch.rb", :line=>"73"}
:delete for /tmp/logtrove/log_xxx.7, deleted from @files {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"85"}
_open_file: /tmp/logtrove/log_xxx.7: opening {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"98"}
/tmp/logtrove/log_xxx.7: sincedb last value 54236, cur size 54236 {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"122"}
/tmp/logtrove/log_xxx.7: sincedb: seeking to 54236 {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"124"}

The reason being the .sincedb file gets created which has offsets. I thought specifying the sincedb_path should not create for any of the files specified by file => path. What am i missing ?

Okay, I give up. It shouldn't be reusing state like that when it's a new inode.

I am wondering whether this is intentional or a bug. When a file is deleted, ignore the offsets and start from zero. I guess this is what you were saying.

-mohan

Ok, here goes another issue, lol. I am also rotating files but not in a ring fashion, I am rotating files linearly. My issues is my file rotation is not ringed, so each of my files contain new indodes for sure. However, I am noticing that when a new file comes in, Logstash does not pick up the file, even though the new file comes in with a new file name (hence new inode), and minor file content changes. So then I try editing the file (in my system, every time I edit a file, the file receives a new inode), and still logstash picks up nothing. So I decided to add a newline character at the end of the file, and Logstash picks that up with a "_grokparsefailure". I then delete the newline (hence creating a new inode) character, and nothing is picked up. What is the issue here?
Using VIM Improved 7.4 to edit files
Using logstash 2.3.2

Here is my config:

input {
file {
path => "/dir/dir2/file_*"
type => "businesslog"
add_field => { "field1" => "value" "filed2" => "value" "appname" => "APP1" }
#codec => multiline {
# pattern => "%{TIMESTAMP_ISO8601}"
# negate => true
# what => previous
# auto_flush_interval => 1
#}
}
}

filter {
if [appname] == "APP1" {
mutate {
strip => [ "message" ]
}
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{GREEDYDATA:input}" }
}
kv {
source => "input"
value_split => "="
field_split => "|"
}
#mutate {
# strip => [ "*" ]
#}
}
}

A sample of my log is:
2016-09-10 10:02:03,534-0700 field4=value|field5=value|field6=value|field7=value|field8=value|field9=value|field10=value|field11-ID_COUNT=value|field12-ID_COUNT=value

This file contains a "\r" character at the end of the file, so "field12-ID_COUNT=value\r"

I have no idea why Logstash won't pickup these files. o.0
Thanks in advance Chrome,

I really don't mean to be a pain in the ***, but I just edited that file again (hence creating a new inode), and Logstash magically picked it up. ??????????? This is extraneous . . . . . .

lol xD