Logstash7.6 - data loss when input file logrotation happens

manu1 · July 3, 2020, 12:16am

Hello,

We have observed data loss when input file's logrotation happens every day.

    input {
      file {
        path => "/weblogs/biweb.log"
        sincedb_path => "/opt/logshipper/ls7_bi/sincedbs/biweb.db"
        type => "biweb.bi_event"
      }
    output {
      if [type] == "biweb.bi_event" {
        kafka {
          bootstrap_servers      => 'KAFKA HOSTS'
          topic_id               => "biweb.event"
          acks                   => "1"
          batch_size             => 100
          linger_ms              => 1000
          retry_backoff_ms       => 5000
          retries                => 10
          compression_type       => "snappy"
          codec                  => plain { format => "%{message}"  }

        }
      }
    }

This is the snippet of config.

Any suggestions on this issue will be helpful.

Thank you!

Badger · July 3, 2020, 12:41am

That could be an inode reuse issue. There are links to various issues in the META issue 211. Especially see 251.

Tracking which files have been read when those files can get rotated is an extremely hard problem. Way harder than most folks would initially think. A good option to get it right is to checksum the file contents (although this is not foolproof), and the file input does not do that, because it can get ridiculously expensive. Instead it implements a very cheap technique that almost always gets it right (but in a few cases it decides it has already read a file that it has not read).

There are other cases where it gets it wrong by duplicating data. As I said, it is a really hard problem.

manu1 · July 7, 2020, 9:00pm

Thank you @Badger for the update.

We are using Logstash1.5 in some legacy pipelines and interestingly we haven't seen this data loss during logrotation in that pipeline.
What is the main difference in Logstash1.5 vs Logstash7.6 to handle log rotation in input file plugin?

We have tried below options:

we have tried wildcard in path to handle the logrotation issue in LS 7.6, however it created huge number of duplicates(re-read all last 24 hour events) and we reverted back

        path => "/weblogs/biweb.log*"

We have tried tuning the pipeline parameters listed below, but no improvements on the data loss

pipeline.workers from 2 to 12 (default: 24) 
pipeline.batch.size from 125 to 250 (default: 125)

Currently sincedb_clean_after and sincedb_write_interval is not set, it is using default values(sincedb_clean_after: 2 weeks, sincedb_write_interval : 15sec). Do any of these property tuning will help?

Badger · July 7, 2020, 10:09pm

I do not think there are any good solutions.

The file input had a huge amount of work done on it between 1.5 and v5 (not so much recently). I could not summarize it.

system · August 4, 2020, 10:09pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash File input for file rotation Logstash	2	1188	July 6, 2017
Logstash file input plugin: missing first lines after log rotation Logstash	3	808	November 22, 2021
File input plugin lost first part of log line Logstash	2	693	November 4, 2022
Logstash file input plugin duplicate events after log file rollover Logstash	1	599	December 11, 2019
Logstash lost data when using file input plugin Logstash	1	840	December 24, 2019

Logstash7.6 - data loss when input file logrotation happens

Related topics