Logstash re-read full log after restart

I am facing an issue with logstash that it read the log from begin even after it has loaded the log file already. It happens only when i restart the logstash. Inside the .sincedb file the contents are :

1149101767 0 38 2097169149 1605095445.375792 /sw/status.log.1604350820
1284180743 0 38 2097194470 1605095445.314869 /sw/status.log.1604757627
1095690174 0 38 2097201253 1605614422.353504
1323601291 0 38 2097182501 1605152503.652109
1397092304 0 38 2097164148 1606043418.0026932
1420192885 0 38 2097178184 1607142900.545617 /sw/status.log.1606415403
1517533812 0 38 2097201524 1607142900.59723 /sw/status.log.1606819810
1604720444 0 38 2097164720 1607271023.755793 /sw/status.log
1694632788 0 38 2097172141 1607678422.098412
1708665521 0 38 2097153541 1608111025.171073 /sw/status.log.1608111007
494839862 0 38 2097167370 1608586863.3369129 /sw/status.log.1608586845
1845236067 0 38 528051388 1608723619.61301 /sw/status.log
1708665521 0 39 2097153541 1609210173.281941 /sw/status.log.1608111007
494839862 0 39 2097167370 1609212554.5121732 /sw/status.log.1608586845
1845236067 0 39 566653898 1609242300.420074 /sw/status.log

  1. Kindly help me understand why the same log file is appearing twice in the .sincedb file.

  2. how to understand the format of sincedb file columns?

  3. Is it possible to force logstash not to pass the duplicate lines either by not re-reading same log lines or filter them out.

The logstash is feeding to kafka and at kafka side it is not possible to filter out duplicate lines.

My logstash config is like this :
input {
file {
path => "/sw/status.log*"
exclude => [ ".lock", ".tmp" ]
file_completed_action => "log"
file_completed_log_path => "/tmp/file_completed_log"
start_position => "beginning"

sincedb_path => "/dev/null"

                              }
                              }

filter{
....
}

output {
kafka {
bootstrap_servers => "xxxxxxxx:9092"
codec => json
topic_id => "status_dlh"
}
}

The format of the sincedb file is documented here. The file input does not track files by name, it uses the inode, which is unique to the device. The problem is that your device number has changed -- the 38 has become a 39. Is this a network mount? Either way, the way to control the minor device number is going to be OS and file system specific.

Thanks Badger for quick answer. Yes the log is stored on an NFS.

The logstash machine got rebooted but there was no change on NFS side.
Can you please suggest some workaround to get rid of this minor device number issue ?

The document suggest not to use Read mode for remote FS, what is difference between read & tail mode? It is not very clear from the documentation.

Will switching between read & tail mode help?

No, switching mode will not help. It thinks it is a different inode, and therefore a different file.

In read mode it is assumed that the file has been written to, so the file input can read the file and not have to come back to it. In tail mode it is assumed that the file is going to be written to in the future, so the file input has to constantly monitor the length of the file to see if it has changed.

one last query on this rather a workaround :
With every restart of logstash machine which tends to change the minor or major device numbers, if i find these numbers to be different than those prior to stop of logstash, can i edit the .sincedb to make major & minor numbers aligned keeping rest as same. All this will be done while logstash is stopped. Will that have any side effect ? I am just trying to find out a workaround for this blocking issue. Please suggest

I have never tried it but I would expect that to work.