I've been struggling with this intermittent issue for months, where sometimes a log file is read twice into elasticsearch, and sincedb shows incorrect filenames while this is happening.
Apologies for a long post, trying to include relevant information and keep it concise.
The duplication always starts shortly after a log file is rotated. I've observed the following example happen in real-time:
-
The current log file which is being written has inode 15204362, filename "flow.log"
- sincedb is being updated every 15 seconds as intended
- The oldest log file in the directory is inode 15204354, filename "flow.log-20240429-1714406426"
- The oldest entry in sincedb is for a file that no longer exists, with inode 15204363, filename "flow.log-20240429-1714370426"
-
The log is rotated:
- the old active log file is renamed to "flow.log-20240501-1714579438" and keeps inode 15204362
- the new active log file is created with inode 15204363, filename "flow.log" (note: same inode as the current oldest entry in sincedb)
-
About 7 seconds later, sincedb is updated:
- it correctly rewrites the line for inode 15204363, resets the byte counter to a low number (indicating that some data has been processed), and changing the filename to "flow.log"
- it rewrites the name of the just-rotated log file from "flow.log" to "flow.log-20240428-1714341626" (!!!), keeping inode number 15204362, and keeping the same byte count as it had already.
-
Logrotate now stops processing the new active log file, and starts re-processing inode 15204362, filename "flow.log-20240501-1714579438", which is the file that was just rotated, which Logstash had already finished.
- sincedb is not updated again until that file is finished processing
- Once that file has been processed the second time, Logstash then continues processing the active log file and catches up to real-time.
There is clearly something strange here with sincedb assigning the incorrect filename (flow.log-20240428-1714341626) to the just-rotated log file. I don't know where it pulled that filename from, because it didn't exist anywhere at the moment that the log rotation happened. It looks like it did exist in the past but would have been deleted several hours ago, so Logstash must be retaining its filename somewhere and incorrectly writing it here?
From the docs I know that the filename recorded in sincedb shouldn't make a difference as it's the inode and device numbers that are tracked - but it seems like a significant coincidence that this naming error occurs at the same time as the duplicate file processing, so may be relevant.
Despite the file name being wrong in sincedb, Logstash should still not be re-processing the old file, because sincedb still records its inode number with a high byte offset value. Yet, it starts this file again from the beginning. Is there any explanation for this that can be fixed by configuration?
Note: Looking at sincedb now a day later, it has corrected the filename for inode 15204362, not sure at what point that happens.
I have 5 pipelines, and this only occurs on the busiest of them. The well-behaved pipelines process less than 800 events per second each, while the pipeline that has the duplication issue, processes between 2000 and 5500 events per second during normal usage. On the surface, it doesn't seem that Logstash is hitting a performance cap with the 5500 e/s, because when it re-processes old log files, it shoots up to about 20000 e/s, and maintains that for over an hour, so under normal operating conditions it seems that there is still plenty of headroom.
Config info:
Logstash v8.12.2
logstash.yml:
pipeline.workers: 16
pipeline.batch.size: 2500
jvm.options:
-Xms30g
-Xmx30g
Hardware:
128GB RAM
2 x 10 Core CPU @ 2.20GHz
15K RPM SAS disks
The "file" configuration for the problem pipeline is:
file {
id => "flow-input-file-1"
path => "/var/log/remote/*/flow/flow.log*"
start_position => "beginning"
sincedb_path => "/etc/logstash/sincedb/flow"
sincedb_write_interval => "15 seconds"
ignore_older => "60 seconds"
}
I had the following logging enabled on Logstash during the above log rotation:
"logger.filewatch.discoverer" : "DEBUG",
"logger.filewatch.observingtail" : "DEBUG",
"logger.filewatch.sincedbcollection" : "DEBUG",
"logger.filewatch.tailmode.handlers.create" : "DEBUG",
"logger.filewatch.tailmode.handlers.delete" : "DEBUG",
"logger.filewatch.tailmode.handlers.grow" : "DEBUG",
"logger.filewatch.tailmode.handlers.shrink" : "DEBUG",
"logger.filewatch.tailmode.handlers.unignore" : "DEBUG",
"logger.filewatch.tailmode.processor" : "DEBUG",
"logger.logstash.inputs.file" : "DEBUG",
"logger.logstash.outputs.elasticsearch" : "DEBUG"
And I trawled through them and I couldn't see anything interesting around that period of time.
Has anyone encountered this type of issue in the past? I see plenty of people who have log files processed multiple times, but I've not seen anyone report this specific set of symptoms, and the fixes that work for others haven't helped me so far.
Is there any other logging that might be useful for me to be looking at?
Thanks if you've read to the end of this!