Elasticsearch huge amount of duplicate with logstash

Hello,

I'm facing a duplicate data issue with Elasticsearch (3 nodes v8.5.2 - green state), coupled with Logstash (1 node v 8.5.2).

So basically we have multiple apps servers sending logs with NLog to Logstash on port 5567 (all on the same index) with the following NLog.config configuration :

        <target xsi:type="Network" name="logstash-prod" address="udp://192.168.0.2:5567"
                layout="${date:format=yyyy-MM-dd HH\:mm\:ss.fff} ${threadid} ${uppercase:${level}} : ${message} ${exception:format=tostring}"/>

        <rules>
                <logger name="*" minlevel="TRACE" writeTo="logstash-prod"/>
        </rules>

Logstash then parse them with the following configuration :

input {
  udp {
    port => 5567
    type => "avindex"
  }
}

filter {

    if [type] == "avindex" {
        grok {
            match => [ "message", "%{TIMESTAMP_ISO8601:date} %{INT:batchid} %{LOGLEVEL:level} :( %{UUID:uuid}? (/ %{NOTSPACE:id2} )?-)? %{GREEDYDATA:message}" ]
            overwrite => ["message"]
        }

        date {
            match => [ "date", "YYYY-MM-dd HH:mm:ss.SSS" ]
            locale => fr
            remove_field => [ "date" ]
        }

        mutate {
            convert => {
                "batchid" => "integer"
                "uuid" => "string"
                "id2" => "string"
            }
        }

        if [message] =~ "Appel du controleur" {
            grok {
                match => [ "message", "Appel du controleur : %{WORD:controller} - Action : %{WORD:controller_action} - Arguments : %{GREEDYDATA:controller_args}" ]
            }
            json {
                source => "controller_args"
            }
            mutate {
                convert => {
                    "controller" => "string"
                   "controller_action" => "string"
                }
            }
        }

        if [message] =~ "Destinataire sur liste noire" {
            grok {
                match => [ "message", "Destinataire sur liste noire : %{NUMBER:blacklisted_number}%{GREEDYDATA}" ]
            }
            grok {
                match => [ "blacklisted_number", "(?<number_prefix>0(8\d{2}|([1-7]|9)))" ]
            }
            mutate {
                convert => {
                    "blacklisted_number" => "integer"
                    "number_prefix" => "integer"
                }
            }
        }
    }
}

and send them to elasticsearch :

output {
    elasticsearch {
       hosts => ["http://192.168.0.3:9200"]
       user => "elastic"
       password => "x"
       ssl => false
       index => "%{type}-%{+YYYY.MM}"
  }
}

Basically, the whole process works fine, but for a reason I cannot understand, it seems that sometimes, for a very short period of time, each line of log processed by Logstash is sent to elasticsearch exactly '31' times instead of one.

We noticed this because our dashboard statistics were incorrect, so I used the following elasticsearch query to return each duplicate of an index based on the 'event.original.keyword' field :

`GET /avindex-2024.04/_search
{
  "size": 0,
  "aggs": {
    "duplicate_ids": {
      "terms": {
        "field": "event.original.keyword",
        "size": 100,  // adjust size as needed, this determines how many duplicate groups are returned
        "min_doc_count": 2  // specify at least 2 to get duplicates
      },
      "aggs": {
        "duplicate_docs": {
          "top_hits": {
            "size": 1  // specify the size of the hits to return
          }
        }
      }
    }
  }
}`

This query always returns a huge number of duplicate docs with a "hits -> total -> value " of 31 for each duplicate documents, and obviously there is only one line of log on the app server side.

I can't see what would cause Logstash to send duplicate logs like this, there's no service interruption in the process and all the servers are on the same network.

Another strange thing is that the apps server also send it's logs via NLog to another Logstash server and ES cluster with exactly the same configuration but on different network, and this time everything works fine without any duplicates.

I've tried updating the logstash filter configuration to clone the index at the very beginning before parsing, but even these unparsed logs get duplicated so it seems to be specifically related to the logstash server, especially since we also send logs directly to the elasticsearch server and this time there are no duplicates.

I would be really greatefull if you have any idea of what could explain this and how I could correct it.

Thank you very much

Up, anyone ?