Logstash on docker: configure sincedb file to use the last indexed position whenever container restarts/new

rahulnama · July 16, 2020, 7:33pm

Hi Team

I mounted the logstash sincedb position to a persistent volume assuming logstash will remember the position of last indexed log. But every time I create/restart a container it's pushing all the logs again to elasticsearch.

What am I missing here.

Kubernetes file:

Logstash sincedb path:

sincedb_path => "/usr/share/logstash/data/sincedb/.sincedb"

Any suggestions would be helpful.

Thank you

Best
Rahul

Badger · July 16, 2020, 8:02pm

A file input tracks a file identity using a combination of name, inode, major and minor device numbers. Is it possible the device number changes when you create a new container?

If you enable trace level logging you should see this message when a sincedb entry is read

[2019-07-30T13:18:08,571][TRACE][filewatch.sincedbcollection] open: setting #<struct FileWatch::InodeStruct inode="8420933", maj=0, min=51713> to #<FileWatch::SincedbValue:0x3050973 @last_changed_at=1564492539.4538882, @path_in_sincedb="/tmp/test/test.log", @watched_file=nil, @position=4>

and this message when it first see a file

[2019-07-30T13:18:08,984][TRACE][filewatch.discoverer     ] discover_files handling: {"new discovery"=>true, "watched_file details"=>"<FileWatch::WatchedFile: @filename='test.log.1', @state='watched', @recent_states='[:watched]', @bytes_read='0', @bytes_unread='0', current_size='4', last_stat_size='4', file_open?='false', @initial=true, @sincedb_key='8420933 0 51713'>"}

Check if the sincedb_key in the second message matches the values in the InodeStruct in the first.

rahulnama · July 16, 2020, 8:17pm

Hi Badger

Not sure on how to check the device number ?

Will see the logs by setting loglevel trace.

Also, in my logstash.conf file have start_position => "beginning". will this force the logstash to start from beginning irrespective of sincedb file ?

Badger · July 16, 2020, 9:57pm

Maybe "sar -d", maybe "lsblk", maybe "ls -l /dev". It really depends on the flavour of your OS.

rahulnama · July 17, 2020, 12:22am

Hi @Badger Badger

Logstash is working fine even when restarted. I removed the start_position from logstash.conf

Still, I have a question. What happens when the log file is cleared and updated with new logs. Because for every application restart log file will be cleared.

Will logstash be able to identify and ingest all the new logs from start. I believe that should be the behavior. please let me know

Thank you

Badger · July 17, 2020, 2:06am

That is what everyone wants, but determining whether an updated file is an extension of a file that has already been mostly read, or a completely new file is ridiculously hard. Far harder than anyone would think until they have attempted to implement it.

If you re-read the entire file and verify that the parts already read are exactly the same then you can assume (sometimes incorrectly) that the file is the same.

Alternatively you can make some assumptions that work almost all the time, and make the process very cheap, but sometimes break down. That is what the file filter does.

rahulnama · July 17, 2020, 2:38am

@Badger

Got it. Very interesting. Any insights on how logstash does it internally ? Excited to know. Also, is there any blog explaining about it. would be very helpful.

Thank you.

Badger · July 17, 2020, 4:03pm

I do not think it is documented anywhere.

rahulnama · July 17, 2020, 8:17pm

Got it.

Also, I'm trying with few different scenarios to fully understand this.

So, I stopped logstash and restarted it after some time. I see no logs in kibana for the time logstash has been shutdown.

I believe once logstash starts, it should either send all the logs or it should send the logs from previous indexed log. But why is it missing the logs ? Any suggestions ?

Badger · July 17, 2020, 8:42pm

It depends on the configuration of the file input, but generally it should start reading from where it left off. So if you are using a date filter to set @timestamp I would expect that gap to have been filled in.

rahulnama · July 17, 2020, 8:47pm

Got it.

This is my file input configuration

And yes, I'm using a date filter to set timestamp.

Is there anything I'm missing ?

-- Rahul

Badger · July 17, 2020, 9:22pm

Not that I can see.

system · August 14, 2020, 9:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Restarting logstash container sends events again to elastic, despite sincedb Logstash docker	8	652	March 14, 2023
Logstash sincedb file issue Logstash	1	488	September 17, 2020
How is the state (last read file or position in a file) is maintained for multiple pods running logstash Logstash docker	5	281	October 13, 2022
What logstash input plugins can you set sincedb_path for? Logstash	5	1540	July 24, 2019
Writing to sincedb_ptah using Logstash and docker doesn't work Logstash	3	744	July 20, 2019

Logstash on docker: configure sincedb file to use the last indexed position whenever container restarts/new

Related topics