Hi,
I have a logstash running in unix host and since db files are getting created in so many numbers.
Can you please clarify below questions
1.How often will these sincedb files be created
2.Will sincedb file be created for each file only once or every time the file is modified
3.will sincedb files gets deleted automatically after certain time period or we need to delete them manually
about 1, the sincedb is created once per path pattern per file input instance. If you have multiple file inputs do not use the same path pattern. There is a sincedb_path option for finer control.
about 2, the sincedb file is created once per file input based on the contents of the path option. if you use a glob pattern in path option then a entry inside the sincedb is created for each file that satisfies the glob pattern but excluding files found to satisfy the excludes option.
about 3, they need to be deleted manually. LS is expecting to tail files that are long lasting but rotated log files. There is logic that senses when a file has shrunk in size and adjusts the sincedb contents to suit. However, if you delete the sincedb file LS may reread files. Deleting an in use sincedb file should be done after LS shutdown. If you use glob patterns and drop files into a watched folder then the sincedb file can grow larger with entries from older files not being removed - LS can't tell that it is never going to see that file again.
Be aware of the INODE reuse problem. Background: the sincedb tracks read content position by INODE, because during file rotation (different techniques have different effects), the name of the log file may change but its INODE does not. Eventually as files are created and deleted inevitably an INODE will be reused and if by chance that INODE has been seen before via a log file name that once satisfied the glob pattern, LS will think it has read the file before. Two things can happen here 1) the new file is smaller than the last-read point then LS will detect this and start from the beginning (assumes rotation) and 2) the new file is bigger than the last-read point then LS will read from the last-read point on. This random unfortunate situation is difficult to foresee and to code for and leads to very confused OPS people.
Thanks for a very detailed explanation. It has provided a good insight
This is how my file plugin is defined in the input of logstash.
input {
file {
type => "Audit"
path => "/mySample/Gway/*"
}
}
1.I have used glob pattern and under /mySample/Gway/ i have 10 files with .txt extension and these files will be updated every minute.
So, as you said 10 sincedb files will be created.Am i right? Once the log file is changed and Logstash reads , the existing sincedb file for that particular log will be overridden by Logstash?
2.The files under the mentioned directory will be rolled out daily.
For example:if i have /mySample/Gway/sampleData.txt file and by tomorrow it will be rolled out to /mySample/Gway/sampleData.txt.08012017 and new /mySample/Gway/sampleData.txt file will be created.
In the above scenario, as we are using glob pattern in the path, logstash will be polling to the previous day's file(/mySample/Gway/sampleData.txt.08012017) too ? and a new sincedb file will be created for the newly generated .txt file?
3.Is there any way to overcome with the INODE reuse issue with respect to Logstash?
Hi,
Can anyone explain why there were many sincedb files created though i have single path pattern in the file input plug in ?
As many since many sincedb files were created and logstash checking those files frequently , this has been causing some performance issues for other processes running on the same host . Please provide your comments
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.