Why I have log duplication when filebeat read rotating log files and stop filebeat for specific time manually and default close inactive reached

I set filebeat to read logs from rotating logs (rotated when 5 mg is reached) and below is my config :

- type: log
 fields:
  source: 'filebeat2'
  logID: logbackup
  fields_under_root: true
  enabled: true
 paths:
 - /home/logbackup/a.log
 -/home/logbackup/backup/a.log-*
 output.logstash:
  hosts: ["ip:5044"]
 worker: 4
 bulk_max_size: 4096
 queue: 
    mem:
     events: 16384

and logstash.yml :

pipeline.workers: 4
pipeline.batch.size: 4096

and close-inactive is default(5min).we have 100 transaction per second .I stop filebeat manually for specific time(for crash test) and when start it manually (with 2 million docs stored in second directory path )and some logs been duplicated. what is the solution ,is it possible solution to increase close-inactive time ?

Hello @alex_petrov

It might be related to the log rotation strategies, more details on it you can find in article, do you use log rotation strategies that copy and truncate the input log file?

I dont use this strategy

Log duplication may still happen independent of the log rotate strategy you use or any other setting you change.

One thing that helps avoiding log duplication in most of the cases is to use a custom _id value instead of letting Elasticsearch set the _id value.

But depending on how you are indexing your data (if you are using time based indices, if you are using data streams, if you are using rollover etc) you still may get some duplicate in some cases.

thanx.do you have any tips to set custom _id value ?

It depends entirely on your document, if you have a field that have a unique id, you can use this field to generate the custom id.

Check this part of the documentation on how to deduplicate data in filebeat.

Thanx leandrojmp , I used fingerprint processor and message field in filebeat, and All message have unique value , is that right ?

        processors:
         - fingerprint:
         fields: ["message"]
         method: sha256
         target_field: "@metadata._id"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.