Duplicate messages when using custom document_id

(Andy Shinn) #1

I am starting to use document_id in my elasticsearch output to help with accidental duplicate log messages. We have a uuid field that comes from our application which we are trying to use as the document_id. I also try and set the field if not already using the uuid filter. Here is an example of our Logstash configuration:

input {
    beats {
        host => ""
        port => 5044
        codec => "json"

filter {
    uuid {
        target    => "uuid"
        overwrite => false

output {
    if [logger] in ["gunicorn.access", "gunicorn.error", "findmine.frontend"] {
        file {
            path => "/var/log/frontend-private-%{+YYYY-MM-dd}.log.gz"
            codec => json_lines
            gzip => true
    } else {
        file {
            path => "/var/log/misc.log"
            codec => json_lines

    elasticsearch {
        hosts => [""]
        document_type => "logs"
        document_id   => "%{[uuid]}"

The Filebeat configuration on hosts:

- input_type: log
    - /var/log/app/*.log

  hosts: [""]

What happens in this case is that I end up with duplicate messages in Elasticsearch. One of the messages has the correct uuid field as the _id and the other seems to have the Elasticsearch generated _id.

What am I doing wrong here? I just want the messages which come to Logstash having the uuid field to use it as the document_id but to generate the uuid if it is otherwise empty.

Logstash 5.6.3
Filebeat 6.1.1
Elasticsearch 6.1.1

(system) #2

