Logstash7.6 - data loss when input file logrotation happens

Hello,

We have observed data loss when input file's logrotation happens every day.

    input {
      file {
        path => "/weblogs/biweb.log"
        sincedb_path => "/opt/logshipper/ls7_bi/sincedbs/biweb.db"
        type => "biweb.bi_event"
      }
    output {
      if [type] == "biweb.bi_event" {
        kafka {
          bootstrap_servers      => 'KAFKA HOSTS'
          topic_id               => "biweb.event"
          acks                   => "1"
          batch_size             => 100
          linger_ms              => 1000
          retry_backoff_ms       => 5000
          retries                => 10
          compression_type       => "snappy"
          codec                  => plain { format => "%{message}"  }

        }
      }
    }

This is the snippet of config.

Any suggestions on this issue will be helpful.

Thank you!

That could be an inode reuse issue. There are links to various issues in the META issue 211. Especially see 251.

Tracking which files have been read when those files can get rotated is an extremely hard problem. Way harder than most folks would initially think. A good option to get it right is to checksum the file contents (although this is not foolproof), and the file input does not do that, because it can get ridiculously expensive. Instead it implements a very cheap technique that almost always gets it right (but in a few cases it decides it has already read a file that it has not read).

There are other cases where it gets it wrong by duplicating data. As I said, it is a really hard problem.

Thank you @Badger for the update.

We are using Logstash1.5 in some legacy pipelines and interestingly we haven't seen this data loss during logrotation in that pipeline.
What is the main difference in Logstash1.5 vs Logstash7.6 to handle log rotation in input file plugin?

We have tried below options:

  1. we have tried wildcard in path to handle the logrotation issue in LS 7.6, however it created huge number of duplicates(re-read all last 24 hour events) and we reverted back
        path => "/weblogs/biweb.log*"
  1. We have tried tuning the pipeline parameters listed below, but no improvements on the data loss
pipeline.workers from 2 to 12 (default: 24) 
pipeline.batch.size from 125 to 250 (default: 125)

Currently sincedb_clean_after and sincedb_write_interval is not set, it is using default values(sincedb_clean_after: 2 weeks, sincedb_write_interval : 15sec). Do any of these property tuning will help?

I do not think there are any good solutions.

The file input had a huge amount of work done on it between 1.5 and v5 (not so much recently). I could not summarize it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.