Logstash pipeline: events get triplicated

Hi all,

I'm investigating a performance issue with my logstash implementation which I believe started with version 2.2.
Made it short, my output got delayed in comparison to the input (input file plugin) and when copy-truncating the input file, the output (elasticsearch) faced a loss of a time window of logs.

To better understand my performance issue, and exclude elasticsearch related issues, i redirected one of my input files into an output file, exploiting the output file plugin. Comparing the input and the output I realized that, in some intervals, my events get triplicated, while they are one-to-one during other time periods.

I'm going crazy, please I really need your help to understand and possibly fix my issue.
Some data to troubleshoot the problem follows.

Thanks a lot,
Daniele

current date: Fri Apr 29 16:10:08 CEST 2016
current input file size: 4.8GB
file rotated at Apr 29 00:00

Line count at 3 AM, no issue

[root@server ~]# cat /inputfile.log  | grep '29-Apr-2016 03:15:' | wc -l
22403
[root@server ~]# cat /outputfile.log | grep '29-Apr-2016 03:15:' | wc -l
22403

line count at 8 AM, issue triplicated output!

[root@server ~]# cat /inputfile.log  | grep '29-Apr-2016 08:15' | wc -l
29320
[root@server ~]# cat /outputfile.log | grep '29-Apr-2016 08:15' | wc -l
87960    

Exactly 3 times the input! if you compare the logs they are identical:

[root@server ~]# grep '29-Apr-2016 08:15:00.534' logstash-input.log | grep 65514 | more 
Apr 29 08:15:00 dns-45 named[21930]: 29-Apr-2016 08:15:00.534 queries: info: client 192.168.45.144#65514: view dns45: query: host.example.com IN A + (192.168.45.254)
[root@server ~]# grep '29-Apr-2016 08:15:00.534' logstash-output.log | grep 65514 |more
{"@version":"1","@timestamp":"2016-04-29T06:15:00.534Z","path":"inputfile.log","host":"server.example.com","type":"dns","tags":["dns","toFile","messageDrop"],"logsource":"dns-45","program":"named","pid":"21930","timestamp":"29-Apr-2016 08:15:00.534","client_ip":"192.168.45.144","client_port":65514,"view":"dns45","query":"host.example.com","class":"IN","record":"A","flags":"+","query_dst_ip":"192.168.45.254"}
{"@version":"1","@timestamp":"2016-04-29T06:15:00.534Z","path":"inputfile.log","host":"server.example.com","type":"dns","tags":["dns","toFile","messageDrop"],"logsource":"dns-45","program":"named","pid":"21930","timestamp":"29-Apr-2016 08:15:00.534","client_ip":"192.168.45.144","client_port":65514,"view":"dns45","query":"host.example.com","class":"IN","record":"A","flags":"+","query_dst_ip":"192.168.45.254"}
{"@version":"1","@timestamp":"2016-04-29T06:15:00.534Z","path":"inputfile.log","host":"server.example.com","type":"dns","tags":["dns","toFile","messageDrop"],"logsource":"dns-45","program":"named","pid":"21930","timestamp":"29-Apr-2016 08:15:00.534","client_ip":"192.168.45.144","client_port":65514,"view":"dns45","query":"host.example.com","class":"IN","record":"A","flags":"+","query_dst_ip":"192.168.45.254"}

Line count at 10 AM, no issue

[root@server ~]# cat /inputfile.log  | grep '29-Apr-2016 10:26:' | wc -l
33890
[root@server ~]# cat /outputfile.log | grep '29-Apr-2016 10:26:' | wc -l
33890

This is my configuration:

[root@server ~]# cat /etc/logstash/conf.d/*
input {
  file {
    path => [ "/file1.log", "/file2.log", "/file3.log" ]
    exclude => "*.gz"
    type => "checkpoint"
  }
  file {
    path => [ "/file4.log" ]
    exclude => "*.gz"
    type => "f5"
  }
  file {
    path => [ "/inputfile.log" ]   ### <------ This is my input file ###
    exclude => "*.gz"
    type => "dns"
  }
}

filter {
  if [type] == "checkpoint" {
    # a lot of work 1
  } else if [type] == "f5" {
    # a lot of work 2
  } else if [type] == "dns" {
    mutate {
      add_tag => [ "dns", "toFile", "messageDrop" ]
    }
    grok {
      patterns_dir => [ "/opt/logstash/patterns" ]
      match => {
        "message" => [ "%{DNS_TYPE001}" ]
      }
    }
    date {
      match => [ "timestamp", "d-MMM-yyyy HH:mm:ss.SSS" ]
    }
  }
  if "messageDrop" in [tags] {
    mutate {
      remove_field => [ "message" ]
    }
  }
}

output {
  if "toFile" in [tags] {
    file {
      path => "/outputfile.log"   ### <------ This is my output file ###
    }
  } else if "toDrop" not in [tags] {
    elasticsearch {
      hosts => [ "192.168.78.25","192.168.78.35","192.168.78.36" ] # all my elasticsearch data nodes
      template_overwrite => true
    }
  }
}