Logstash duplicate

Hello,

I'm facing a duplicate data issue with Elasticsearch (3 nodes v8.5.2 - green state), coupled with Logstash (1 node v 8.5.2).

So basically we have multiple apps servers sending logs with NLog to Logstash on port 5567 (all on the same index) with the following NLog.config configuration :

        <target xsi:type="Network" name="logstash-prod" address="udp://192.168.0.2:5567"
                layout="${date:format=yyyy-MM-dd HH\:mm\:ss.fff} ${threadid} ${uppercase:${level}} : ${message} ${exception:format=tostring}"/>

        <rules>
                <logger name="*" minlevel="TRACE" writeTo="logstash-prod"/>
        </rules>

Logstash then parse them with the following configuration :

input {
  udp {
    port => 5567
    type => "avindex"
  }
}

filter {

    if [type] == "avindex" {
        grok {
            match => [ "message", "%{TIMESTAMP_ISO8601:date} %{INT:batchid} %{LOGLEVEL:level} :( %{UUID:uuid}? (/ %{NOTSPACE:id2} )?-)? %{GREEDYDATA:message}" ]
            overwrite => ["message"]
        }

        date {
            match => [ "date", "YYYY-MM-dd HH:mm:ss.SSS" ]
            locale => fr
            remove_field => [ "date" ]
        }

        mutate {
            convert => {
                "batchid" => "integer"
                "uuid" => "string"
                "id2" => "string"
            }
        }

        if [message] =~ "Appel du controleur" {
            grok {
                match => [ "message", "Appel du controleur : %{WORD:controller} - Action : %{WORD:controller_action} - Arguments : %{GREEDYDATA:controller_args}" ]
            }
            json {
                source => "controller_args"
            }
            mutate {
                convert => {
                    "controller" => "string"
                   "controller_action" => "string"
                }
            }
        }

        if [message] =~ "Destinataire sur liste noire" {
            grok {
                match => [ "message", "Destinataire sur liste noire : %{NUMBER:blacklisted_number}%{GREEDYDATA}" ]
            }
            grok {
                match => [ "blacklisted_number", "(?<number_prefix>0(8\d{2}|([1-7]|9)))" ]
            }
            mutate {
                convert => {
                    "blacklisted_number" => "integer"
                    "number_prefix" => "integer"
                }
            }
        }
    }
}

and send them to elasticsearch :

output {
    elasticsearch {
       hosts => ["http://192.168.0.3:9200"]
       user => "elastic"
       password => "x"
       ssl => false
       index => "%{type}-%{+YYYY.MM}"
  }
}

Basically, the whole process works fine, but for a reason I cannot understand, it seems that sometimes, for a very short period of time, each line of log processed by Logstash is sent to elasticsearch exactly '31' times instead of one.

We noticed this because our dashboard statistics were incorrect, so I used the following elasticsearch query to return each duplicate of an index based on the 'event.original.keyword' field :

`GET /avindex-2024.04/_search
{
  "size": 0,
  "aggs": {
    "duplicate_ids": {
      "terms": {
        "field": "event.original.keyword",
        "size": 100,  // adjust size as needed, this determines how many duplicate groups are returned
        "min_doc_count": 2  // specify at least 2 to get duplicates
      },
      "aggs": {
        "duplicate_docs": {
          "top_hits": {
            "size": 1  // specify the size of the hits to return
          }
        }
      }
    }
  }
}`

This query always returns a huge number of duplicate docs with a "hits -> total -> value " of 31 for each duplicated documents, and obviously there is only one line of log on the app server side.

I can't see what would cause Logstash to send duplicate logs like this, there's no service interruption in the process and all the servers are on the same network.

Another strange thing is that the apps server also send it's logs via NLog to another Logstash server and ES cluster with exactly the same configuration but on different network, and this time everything works fine without any duplicates.

I've tried updating the logstash filter configuration to clone the index at the very beginning before parsing, but even these unparsed logs get duplicated so it seems to be specifically related to the logstash server, especially since we also send logs directly to the elasticsearch server and this time there are no duplicates.

I would be really greatefull if you have any idea of what could explain this and how I could correct it.

Thank you very much

There is nothing in the Logstash pipeline that would duplicate the data.

Can you share some examples of duplicated documents? You just shared one document, please share more documents, the entire document.

Also, is this the only configuration you have in your logstash? How are you running it? As a service?

Ok, this is probably a total wild goose chase, but are there any network segments in the problem environment that are a /27 network? If not, you can probably quit reading here.

I once had a very strange net performance issue, in debugging, periodically my inbound tcpdump would get a flood of repeated packets. I don't remember the number, but it was the number of hosts of hosts on the network minus 1. It ended up being some kind of router issue, something about the router losing the location of the mac, so it broadcast the packets to everything on the segment to relearn the mac.

1 Like

Thank you for your reply,

We have multiple logstash configuration for other sources, and while checking your comment I noticed we have the same issue for other sources that comes from a different network. The problem could be related to a network/router issue.
Logstash is running as a service on a dedicated server.

Please find below several documents related to this issue :

  "aggregations": {
    "duplicate_ids": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 743294,
      "buckets": [
        {
          "key": "2025-03-01 01:55:21.022 25 INFO : Executed action ProductOrder.Controllers.ExploitationController.Get (ProductOrder) in 0.2371ms ",
          "doc_count": 31,
          "duplicate_docs": {
            "hits": {
              "total": {
                "value": 31,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "index-2025.03",
                  "_id": "eP0zT5UBMsyG47naXKHi",
                  "_score": 0,
                  "_source": {
                    "level": "INFO",
                    "batchid": 25,
                    "event": {
                      "original": "2025-03-01 01:55:21.022 25 INFO : Executed action ProductOrder.Controllers.ExploitationController.Get (ProductOrder) in 0.2371ms "
                    },
                    "@timestamp": "2025-03-01T00:55:21.022Z",
                    "message": "Executed action ProductOrder.Controllers.ExploitationController.Get (ProductOrder) in 0.2371ms ",
                    "type": "index",
                    "@version": "1",
                    "host": {
                      "ip": "192.168.1.1"
                    }
                  }
                }
              ]
            }
          }
        },
        {
          "key": "2025-03-01 01:55:21.022 25 INFO : Executed endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' ",
          "doc_count": 31,
          "duplicate_docs": {
            "hits": {
              "total": {
                "value": 31,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "index-2025.03",
                  "_id": "ff0zT5UBMsyG47naXKHi",
                  "_score": 0,
                  "_source": {
                    "level": "INFO",
                    "batchid": 25,
                    "event": {
                      "original": "2025-03-01 01:55:21.022 25 INFO : Executed endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' "
                    },
                    "@timestamp": "2025-03-01T00:55:21.022Z",
                    "message": "Executed endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' ",
                    "type": "index",
                    "@version": "1",
                    "host": {
                      "ip": "192.168.1.1"
                    }
                  }
                }
              ]
            }
          }
        },
        {
          "key": "2025-03-01 01:55:21.022 25 INFO : Executing ObjectResult, writing value of type 'System.String'. ",
          "doc_count": 31,
          "duplicate_docs": {
            "hits": {
              "total": {
                "value": 31,
                "relation": "eq"
              },
              "max_score": 0,
              "hits": [
                {
                  "_index": "index-2025.03",
                  "_id": "c_0zT5UBMsyG47naXKHi",
                  "_score": 0,
                  "_source": {
                    "level": "INFO",
                    "batchid": 25,
                    "event": {
                      "original": "2025-03-01 01:55:21.022 25 INFO : Executing ObjectResult, writing value of type 'System.String'. "
                    },
                    "@timestamp": "2025-03-01T00:55:21.022Z",
                    "message": "Executing ObjectResult, writing value of type 'System.String'. ",
                    "type": "index",
                    "@version": "1",
                    "host": {
                      "ip": "192.168.1.1"
                    }
                  }
                }
              ]
            }
          }
        },

Hi Len and thanks for your suggestion,

Actually we only use /24 networks, but given how strange the problem we're experiencing is, I wouldn't be surprised if it's similar to yours.

I do not have direct access to our router to check this, but I think my way will be to tcpdump locally on the logstash server to check if it is indeed receiving the same data multiple times while the source server is only sending it once.

Since you have multiple configurations, are you using multiple pipelines in pipelines.yml or are you using the default pipelines.yml ? Please share your pipelines.yml

Are the outputs of your other configurations similar to this one? Can you share some other examples?

Never saw an issue like that to being related to network issues, so I can not confirm that this can be the cause.

The oddness is the "sometimes". How often does it happen? Once per n events? Once per hour?

I'm still betting on it being external to logstash; but I've lost bets before.

Given the input is UDP , we would both be losing together. Some duplicate UDP packets on a busy network - no, can’t happen surely :thinking:

Same question was posted last year , by same user, twice.

@AVMOps - have you investigated the network side any of the times you posted here ? If so, how did you investigate ?

(edit: ask yourself what would happen if the same UDP packet appears at 192.168.0.2:5567 multiple times? would it be any different to what you are seeing?)

just to add, finding such things on a busy network can be hard. Really hard, particularly if it's intermittent and uncommon.

One approach would be to tap the interface for 192.168.0.2 to a different (physical) port, sometimes called port mirroring, and just sniff&wait on the second port. Or use iptables on the receiving end (logstash host here I think) to do much the same using the TEE target.

@AVMOps did you find any resolution to your issue ?

Hi Kevin and thanks you for your help,

We didn't actually investigate further on the network side as there is no similar issue on our infrastructure, so I was expecting a problem related to logstash or NLog.

After reading your message it makes sense to me that it's probably related to a network issue, more specifically to UDP, without really knowing the cause.

I analysed the incoming packet on logstash and the only thing I could find is that the 31 'duplicate docs' are related to the TTL of the packet which is of 64, decreasing by 2 every time the packet is sent so the router ends up sending a total of 31 packets before stopping.

We switched the logstash input to TCP and it seems the duplicate problem no longer happens but this switch comes with other problems, such as logstash merging multiple lines of log in one document probably related to TCP working in 'continous stream'.

to fix this I updated the logstash input to include the 'multiline' codec while specifying a pattern (the date, that is is at the beginning of each line of log).

input {
  tcp {
    port => 5567
    type => "avindex"
    codec => multiline {
#      pattern => "^%{TIMESTAMP_ISO8601} "
      pattern => "^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}\s"
      negate => true
      what => "previous"
      auto_flush_interval => 1
    }
    dns_reverse_lookup_enabled => false  
  }
}

And this configuration almost resolve the issue, but for some reason, some lines of logs are still merged into one document, even with the pattern being matched twice in the same doc, for example :

document including only one line of log :

{
  "_index": "avindex-2025.03",
  "_id": "5gtho5UBj4_JXomUfVtH",
  "_version": 1,
  "_score": null,
  "fields": {
    "event.original": [
      "2025-03-17 10:13:50.042 28 INFO : Request starting HTTP/1.1 GET http://192.168.111.57:5100/api/exploitation   "
    ],
    "level": [
      "INFO"
    ],
    "@version.keyword": [
      "1"
    ],
    "type": [
      "avindex"
    ],
    "message": [
      "Request starting HTTP/1.1 GET http://192.168.111.57:5100/api/exploitation   "
    ],
    "batchid": [
      28
    ],
    "host.ip": [
      "192.168.111.57"
    ],
    "@timestamp": [
      "2025-03-17T09:13:50.042Z"
    ],
    "host.ip.keyword": [
      "192.168.111.57"
    ],
    "level.keyword": [
      "INFO"
    ],
    "type.keyword": [
      "avindex"
    ],
    "message.keyword": [
      "Request starting HTTP/1.1 GET http://192.168.111.57:5100/api/exploitation   "
    ],
    "event.original.keyword": [
      "2025-03-17 10:13:50.042 28 INFO : Request starting HTTP/1.1 GET http://192.168.111.57:5100/api/exploitation   "
    ],
    "@version": [
      "1"
    ]
  },
  "sort": [
    1742202830042
  ]
}

Document merging two lines of log :

{
  "@timestamp": [
    "2025-03-17T09:13:50.042Z"
  ],
  "@version": [
    "1"
  ],
  "@version.keyword": [
    "1"
  ],
  "batchid": [
    28
  ],
  "event.original": [
    "2025-03-17 10:13:50.042 28 INFO : Executing endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' 2025-03-17 10:13:50.042 28 INFO : Route matched with {action = \"Get\", controller = \"Exploitation\"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get() on controller ProductOrder.Controllers.ExploitationController (ProductOrder). "
  ],
  "event.original.keyword": [
    "2025-03-17 10:13:50.042 28 INFO : Executing endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' 2025-03-17 10:13:50.042 28 INFO : Route matched with {action = \"Get\", controller = \"Exploitation\"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get() on controller ProductOrder.Controllers.ExploitationController (ProductOrder). "
  ],
  "host.ip": [
    "192.168.111.57"
  ],
  "host.ip.keyword": [
    "192.168.111.57"
  ],
  "level": [
    "INFO"
  ],
  "level.keyword": [
    "INFO"
  ],
  "message": [
    "Executing endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' 2025-03-17 10:13:50.042 28 INFO : Route matched with {action = \"Get\", controller = \"Exploitation\"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get() on controller ProductOrder.Controllers.ExploitationController (ProductOrder). "
  ],
  "message.keyword": [
    "Executing endpoint 'ProductOrder.Controllers.ExploitationController.Get (ProductOrder)' 2025-03-17 10:13:50.042 28 INFO : Route matched with {action = \"Get\", controller = \"Exploitation\"}. Executing controller action with signature Microsoft.AspNetCore.Mvc.IActionResult Get() on controller ProductOrder.Controllers.ExploitationController (ProductOrder). "
  ],
  "type": [
    "avindex"
  ],
  "type.keyword": [
    "avindex"
  ],
  "_id": "6Qtho5UBj4_JXomUfVtH",
  "_index": "avindex-2025.03",
  "_score": null
}

Sorry for the huge message and thank you again for your help.

Thanks for the update.

IMO you should investigate that. Well, if it's happening once a year, then forget it but it seems there is something not right in your networking. It could be causing other (subtle) issues you would not even know about.

You will know that a lot of IT is built on top of the higher level network protocols, e.g. HTTP and SMTP and NFS and many others are all at layer 7 aka application layer. There is reason for that. When working at layer 4, TCP/UDP, you sort of become the application, and you need to handle certain situations you probably dont want to handle.

Is there a newline embedded in the document that contains "two lines". It doesn't look like it. If there is no newline between two lines then the file input will consume them as a single line before the multiline codec ever sees them.