Logstash on docker: configure sincedb file to use the last indexed position whenever container restarts/new

Hi Team

I mounted the logstash sincedb position to a persistent volume assuming logstash will remember the position of last indexed log. But every time I create/restart a container it's pushing all the logs again to elasticsearch.

What am I missing here.

Kubernetes file:

Logstash sincedb path:

sincedb_path => "/usr/share/logstash/data/sincedb/.sincedb"

Any suggestions would be helpful.

Thank you

Best
Rahul

A file input tracks a file identity using a combination of name, inode, major and minor device numbers. Is it possible the device number changes when you create a new container?

If you enable trace level logging you should see this message when a sincedb entry is read

[2019-07-30T13:18:08,571][TRACE][filewatch.sincedbcollection] open: setting #<struct FileWatch::InodeStruct inode="8420933", maj=0, min=51713> to #<FileWatch::SincedbValue:0x3050973 @last_changed_at=1564492539.4538882, @path_in_sincedb="/tmp/test/test.log", @watched_file=nil, @position=4>

and this message when it first see a file

[2019-07-30T13:18:08,984][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='test.log.1', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='4', last_stat_size='4', file_open?='false', @initial=true, @sincedb_key='8420933 0 51713'>"}

Check if the sincedb_key in the second message matches the values in the InodeStruct in the first.

Hi Badger

Not sure on how to check the device number ?

Will see the logs by setting loglevel trace.

Also, in my logstash.conf file have start_position => "beginning". will this force the logstash to start from beginning irrespective of sincedb file ?

Maybe "sar -d", maybe "lsblk", maybe "ls -l /dev". It really depends on the flavour of your OS.

Hi @Badger Badger

Logstash is working fine even when restarted. I removed the start_position from logstash.conf

Still, I have a question. What happens when the log file is cleared and updated with new logs. Because for every application restart log file will be cleared.

Will logstash be able to identify and ingest all the new logs from start. I believe that should be the behavior. please let me know

Thank you

That is what everyone wants, but determining whether an updated file is an extension of a file that has already been mostly read, or a completely new file is ridiculously hard. Far harder than anyone would think until they have attempted to implement it.

If you re-read the entire file and verify that the parts already read are exactly the same then you can assume (sometimes incorrectly) that the file is the same.

Alternatively you can make some assumptions that work almost all the time, and make the process very cheap, but sometimes break down. That is what the file filter does.

1 Like

@Badger

Got it. Very interesting. Any insights on how logstash does it internally ? Excited to know. Also, is there any blog explaining about it. would be very helpful.

Thank you.

I do not think it is documented anywhere.

Got it.

Also, I'm trying with few different scenarios to fully understand this.

So, I stopped logstash and restarted it after some time. I see no logs in kibana for the time logstash has been shutdown.

I believe once logstash starts, it should either send all the logs or it should send the logs from previous indexed log. But why is it missing the logs ? Any suggestions ?

It depends on the configuration of the file input, but generally it should start reading from where it left off. So if you are using a date filter to set @timestamp I would expect that gap to have been filled in.

Got it.

This is my file input configuration

And yes, I'm using a date filter to set timestamp.

Is there anything I'm missing ?

-- Rahul

Not that I can see.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.