I am new to this group. I have been playing around with Logstash for about a couple of weeks. I was surprised to see that Logstash does not provide the feature where in, it reads a log file once and forgets about it. But it seems to have a very strong affinity to the files it has read. I am pretty sure there has been a need where you get thousands of log files and you just read them and forget it.
Reading through the logs, I see many folks have asked the same question. I have not found a satisfactory answer. I have decided to do some log rotation myself outside of logstash but then logstash still remembers the offsets in memory and even if i rotate through to an older file, it reads from the offsets from where it left off last time. I could work around this by truncating the file before it gets written to and hoping that logstash notices the truncation first before any new content gets written to. Obviously, this is very racy. If there is not enough time between truncation and new content, it would read from where it left off.
Can someone tell me a reliable way to do this log rotation ? I have sincedb_path set to NULL. Any clues would be appreciated ?
I have decided to do some log rotation myself outside of logstash but then logstash still remembers the offsets in memory and even if i rotate through to an older file, it reads from the offsets from where it left off last time.
What, exactly, did you do? What does "rotate through to an older file" mean?
-Maintain logfile.0, logfile.1.... logfile.9
-When data comes in for a source, copy it to logfile.0, logfile.1 etc.
If I now reach the end of the chain logfile.9, I want to go back to logfile.0. This would not work because logstash remembers the offsets. Hence, when you write to logfile.0, I truncate the next file logfile.1 i.e when logfile.9 is written to truncate logfile.0 hoping that by the time I get to logfile.0, it would notice that and reset its offset (I see debug logs indicate that the new offset is smaller than the previous offset and starts over). This seems to work except that it is not reliable. I have to put in a artificial sleep between truncate and writing the new content.
You're still not being sufficiently exact in your expression, but it seems like you've implemented a kind of ring buffer where you write to logfile.0 until that file reaches a certain size, then you switch to writing to logfile.1, and so on. Or?
Logstash uses inodes to track file offsets. Prefer creating new files since that (except for inode reuse) guarantees that Logstash will pick it up as a new file with no old state. But yes, there's still a race if you have start_position => beginning", and with start_position => end you might risk reprocessing old files depending on the details of how you manage the files.
Sorry for not being clear. Yes, it is like a ring buffer. In my case, I write to each file just once (for now). When I reach the end, I go back to the first file in the chain. When I go there, all I expect is to logstash to read from the beginning. I have start_position => beginning but that is only for a new file AFAICT.
I tried deleting the old files (this is what I tried first) but that did not work well. It remembers the name of the file too (besides the inode) and hence remembers the offsets. And there is always the inode reuse which makes it less reliable.
How do other people solve this problem ? I can't have my input file growing forever and hence need rotation. But then the rotation has this problem. Is it just me or how are other people solving this problem ?
Log files are usually rotated via a rename operation, e.g. logfile.log is renamed to logfile.log.0 and logfile.log is recreated (and obviously, logfile.log.0 is renamed to logfile.log.1 and so on). I've never heard about Logstash having any problems forgetting about the old logfile.log once a new file is created, and the behavior you describe isn't consistent with my reading of the source code of the filewatch library that I read just a few days ago. Can you post a reproducible example and/or debug log snippets that exhibit the problem?
Both shrinking files and files whose path suddenly points to another inode result in the same thing, namely that the original file and its state is forgotten and that the file is picked up as a new file.
I will work on getting some logs this afternoon. I have a question about the renaming operation. How do you know when to rename the logfile ? Logstash could be reading that file while you are renaming it ?
Renaming a file that's open is perfectly fine. Logstash can keep on reading that file for as long as it likes. Unfortunately it won't since it'll discover that the file has been rotated and close it.
I hope someone from the Logstash team can step in here. I'm not sure I have the full picture. I hope I don't, because there seems to be races and other bugs everywhere.
In my test case, the file is always deleted before writing. It works initially where you can see the files being deleted and logstash sees them and hence sincedb is null, it resets the offset to zero. When it does not work, I see this log:
/tmp/logtrove/log_xxx.7: old inode was [1073973, 0, 64768], new is [1073879, 0, 64768] {:level=>:debug, :file=>"filewatch/watch.rb", :line=>"73"}
:delete for /tmp/logtrove/log_xxx.7, deleted from @files {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"85"}
_open_file: /tmp/logtrove/log_xxx.7: opening {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"98"}
/tmp/logtrove/log_xxx.7: sincedb last value 54236, cur size 54236 {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"122"}
/tmp/logtrove/log_xxx.7: sincedb: seeking to 54236 {:level=>:debug, :file=>"filewatch/tail.rb", :line=>"124"}
The reason being the .sincedb file gets created which has offsets. I thought specifying the sincedb_path should not create for any of the files specified by file => path. What am i missing ?
I am wondering whether this is intentional or a bug. When a file is deleted, ignore the offsets and start from zero. I guess this is what you were saying.
Ok, here goes another issue, lol. I am also rotating files but not in a ring fashion, I am rotating files linearly. My issues is my file rotation is not ringed, so each of my files contain new indodes for sure. However, I am noticing that when a new file comes in, Logstash does not pick up the file, even though the new file comes in with a new file name (hence new inode), and minor file content changes. So then I try editing the file (in my system, every time I edit a file, the file receives a new inode), and still logstash picks up nothing. So I decided to add a newline character at the end of the file, and Logstash picks that up with a "_grokparsefailure". I then delete the newline (hence creating a new inode) character, and nothing is picked up. What is the issue here?
Using VIM Improved 7.4 to edit files
Using logstash 2.3.2
A sample of my log is:
2016-09-10 10:02:03,534-0700 field4=value|field5=value|field6=value|field7=value|field8=value|field9=value|field10=value|field11-ID_COUNT=value|field12-ID_COUNT=value
This file contains a "\r" character at the end of the file, so "field12-ID_COUNT=value\r"
I have no idea why Logstash won't pickup these files. o.0
Thanks in advance Chrome,
I really don't mean to be a pain in the ***, but I just edited that file again (hence creating a new inode), and Logstash magically picked it up. ??????????? This is extraneous . . . . . .
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.