I mounted the logstash sincedb position to a persistent volume assuming logstash will remember the position of last indexed log. But every time I create/restart a container it's pushing all the logs again to elasticsearch.
A file input tracks a file identity using a combination of name, inode, major and minor device numbers. Is it possible the device number changes when you create a new container?
If you enable trace level logging you should see this message when a sincedb entry is read
Logstash is working fine even when restarted. I removed the start_position from logstash.conf
Still, I have a question. What happens when the log file is cleared and updated with new logs. Because for every application restart log file will be cleared.
Will logstash be able to identify and ingest all the new logs from start. I believe that should be the behavior. please let me know
That is what everyone wants, but determining whether an updated file is an extension of a file that has already been mostly read, or a completely new file is ridiculously hard. Far harder than anyone would think until they have attempted to implement it.
If you re-read the entire file and verify that the parts already read are exactly the same then you can assume (sometimes incorrectly) that the file is the same.
Alternatively you can make some assumptions that work almost all the time, and make the process very cheap, but sometimes break down. That is what the file filter does.
Got it. Very interesting. Any insights on how logstash does it internally ? Excited to know. Also, is there any blog explaining about it. would be very helpful.
I believe once logstash starts, it should either send all the logs or it should send the logs from previous indexed log. But why is it missing the logs ? Any suggestions ?
It depends on the configuration of the file input, but generally it should start reading from where it left off. So if you are using a date filter to set @timestamp I would expect that gap to have been filled in.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.