When logstash shutdown, renam file and data can be lost or duplicated

i'm ready to use logstah to transfer a local file to kafka.
i assume the logstash is shutdown, i rename the file name (data.log -> data.log.1) and echo one recode like '111' to data.log.1. when i restart logstash, the data.log.1 will be comsumed all.
logstash version: 7.1.0
conf file:
input {
file {
path => '/root/hepeihui/data/logstash/*'
sincedb_path => '/root/hepeihui/data/sincedb/test.db'
ignore_older => 86400
start_position => 'beginning'
}
}

output {
  stdout {}
}

when i use logstash 6.3.2 ,it work ok.

:woozy_face:

I would have expected the file input to handle that. Basically, if you rename a file while logstash is down it thinks it is a new file even though it has a matching sincedb entry.

With this configuration

file { path => "/tmp/test/*" sincedb_path => "/tmp/since.db" start_position => beginning }
filter {}
output { stdout { codec => rubydebug { metadata => false } } }

then

mkdir /tmp/test
echo foo >> /tmp/test/test.log
# Run logstash and it ingests foo, kill logstash

sincedb contains

8420933 0 51713 4 1564492539.4538882 /tmp/test/test.log

Then

mv /tmp/test/test.log /tmp/test/test.log.1

Running logstash again we get

[2019-07-30T13:18:08,423][TRACE][filewatch.sincedbcollection] open: reading from /tmp/since.db
[2019-07-30T13:18:08,453][DEBUG][logstash.agent           ] Trying to start WebServer {:port=>9600}
[2019-07-30T13:18:08,544][TRACE][filewatch.sincedbcollection] open: importing ... '8420933 0 51713' => '4 1564492539.4538882 /tmp/test/test.log'
[2019-07-30T13:18:08,571][TRACE][filewatch.sincedbcollection] open: setting #<struct FileWatch::InodeStruct inode="8420933", maj=0, min=51713> to #<FileWatch::SincedbValue:0x3050973 @last_changed_at=1564492539.4538882, @path_in_sincedb="/tmp/test/test.log", @watched_file=nil, @position=4>
[2019-07-30T13:18:08,575][TRACE][filewatch.sincedbcollection] open: count of keys read: 1
[2019-07-30T13:18:08,695][DEBUG][logstash.api.service     ] [api-service] start
[2019-07-30T13:18:08,756][TRACE][filewatch.discoverer     ] discover_files {"count"=>1}
[2019-07-30T13:18:08,984][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='test.log.1', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='4', last_stat_size='4', file_open?='false', @initial=true, @sincedb_key='8420933 0 51713'>"}
[2019-07-30T13:18:09,005][TRACE][filewatch.sincedbcollection] associate: finding {"inode"=>"8420933", "path"=>"/tmp/test/test.log.1"}
[2019-07-30T13:18:09,024][TRACE][filewatch.sincedbcollection] associate: found sincedb record {"filename"=>"test.log.1", "sincedb key"=>#<struct FileWatch::InodeStruct inode="8420933", maj=0, min=51713>, "sincedb_value"=>#<FileWatch::SincedbValue:0x3050973 @last_changed_at=1564492539.4538882, @path_in_sincedb="/tmp/test/test.log", @watched_file=nil, @position=4>}
[2019-07-30T13:18:09,058][TRACE][filewatch.sincedbcollection] associate: matched but allocated to another
[2019-07-30T13:18:09,252][TRACE][filewatch.tailmode.processor] Delayed Delete processing
[2019-07-30T13:18:09,267][TRACE][filewatch.tailmode.processor] Watched + Active restat processing
[2019-07-30T13:18:09,297][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2019-07-30T13:18:09,358][TRACE][filewatch.tailmode.processor] Rotation In Progress processing
[2019-07-30T13:18:09,373][TRACE][filewatch.tailmode.processor] Watched processing
[2019-07-30T13:18:09,414][TRACE][filewatch.tailmode.handlers.createinitial] handling: test.log.1
[2019-07-30T13:18:09,447][TRACE][filewatch.tailmode.handlers.createinitial] opening test.log.1
[2019-07-30T13:18:07,424][DEBUG][org.logstash.config.ir.CompiledPipeline] [...]
[2019-07-30T13:18:09,564][TRACE][filewatch.tailmode.handlers.createinitial] handle_specifically opened file handle: 89, path: test.log.1
[2019-07-30T13:18:09,660][TRACE][filewatch.tailmode.handlers.createinitial] add_new_value_sincedb_collection {"position"=>0, "watched_file details"=>"<FileWatch::WatchedFile: @filename='test.log.1', @state='active', @recent_states='[:watched, :watched]', @bytes_read='0', @bytes_unread='4', current_size='4', last_stat_size='4', file_open?='true', @initial=true, @sincedb_key='8420933 0 51713'>"}
[2019-07-30T13:18:09,712][TRACE][filewatch.tailmode.processor] Active - file grew: test.log.1: new size is 4, bytes read 0
[2019-07-30T13:18:09,715][TRACE][filewatch.tailmode.handlers.grow] handling: test.log.1
[2019-07-30T13:18:09,764][TRACE][filewatch.tailmode.handlers.grow] reading... {"iterations"=>1, "amount"=>4, "filename"=>"test.log.1"}
[2019-07-30T13:18:09,779][DEBUG][filewatch.tailmode.handlers.grow] read_to_eof: get chunk
[2019-07-30T13:18:09,838][DEBUG][logstash.inputs.file     ] Received line {:path=>"/tmp/test/test.log.1", :text=>"foo"}
[2019-07-30T13:18:10,092][DEBUG][filewatch.sincedbcollection] writing sincedb (delta since last write = 1564492690)
[2019-07-30T13:18:10,096][TRACE][filewatch.sincedbcollection] sincedb_write: to: /tmp/since.db
{
       "message" => "foo",
    "@timestamp" => 2019-07-30T13:18:09.996Z,
      "@version" => "1",
          "path" => "/tmp/test/test.log.1"
}

@guyboertje would you expect it detect the file had been rotated while logstash was down?

Ahh this as you describe is the exact problem of not having enough information to accurately discern between rotation and inode recycling.

Previously, the handling was to treat it as rotation (as you expect) but after the rotation code was added (rotations detected while tailing, file is open) I changed it to handle this scenario as inode recycling.

Only content fingerprinting can determine with much more certainty whether content has been seen before regardless of file name or inode. This is coming to Beats filebeat soonish and then maybe to the file input.