Duplicate messages when using custom document_id

I am starting to use document_id in my elasticsearch output to help with accidental duplicate log messages. We have a uuid field that comes from our application which we are trying to use as the document_id. I also try and set the field if not already using the uuid filter. Here is an example of our Logstash configuration:

input {
    beats {
        host => "0.0.0.0"
        port => 5044
        codec => "json"
    }
}

filter {
    uuid {
        target    => "uuid"
        overwrite => false
    }
}

output {
    if [logger] in ["gunicorn.access", "gunicorn.error", "findmine.frontend"] {
        file {
            path => "/var/log/frontend-private-%{+YYYY-MM-dd}.log.gz"
            codec => json_lines
            gzip => true
        }
    } else {
        file {
            path => "/var/log/misc.log"
            codec => json_lines
        }
    }

    elasticsearch {
        hosts => ["10.10.10.10"]
        document_type => "logs"
        document_id   => "%{[uuid]}"
    }
}

The Filebeat configuration on hosts:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/app/*.log

output.logstash:
  hosts: ["10.10.10.20:5044"]

What happens in this case is that I end up with duplicate messages in Elasticsearch. One of the messages has the correct uuid field as the _id and the other seems to have the Elasticsearch generated _id.

What am I doing wrong here? I just want the messages which come to Logstash having the uuid field to use it as the document_id but to generate the uuid if it is otherwise empty.

Logstash 5.6.3
Filebeat 6.1.1
Elasticsearch 6.1.1

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.