Duplicate messages when using custom document_id


(Andy Shinn) #1

I am starting to use document_id in my elasticsearch output to help with accidental duplicate log messages. We have a uuid field that comes from our application which we are trying to use as the document_id. I also try and set the field if not already using the uuid filter. Here is an example of our Logstash configuration:

input {
    beats {
        host => "0.0.0.0"
        port => 5044
        codec => "json"
    }
}

filter {
    uuid {
        target    => "uuid"
        overwrite => false
    }
}

output {
    if [logger] in ["gunicorn.access", "gunicorn.error", "findmine.frontend"] {
        file {
            path => "/var/log/frontend-private-%{+YYYY-MM-dd}.log.gz"
            codec => json_lines
            gzip => true
        }
    } else {
        file {
            path => "/var/log/misc.log"
            codec => json_lines
        }
    }

    elasticsearch {
        hosts => ["10.10.10.10"]
        document_type => "logs"
        document_id   => "%{[uuid]}"
    }
}

The Filebeat configuration on hosts:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/app/*.log

output.logstash:
  hosts: ["10.10.10.20:5044"]

What happens in this case is that I end up with duplicate messages in Elasticsearch. One of the messages has the correct uuid field as the _id and the other seems to have the Elasticsearch generated _id.

What am I doing wrong here? I just want the messages which come to Logstash having the uuid field to use it as the document_id but to generate the uuid if it is otherwise empty.

Logstash 5.6.3
Filebeat 6.1.1
Elasticsearch 6.1.1


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.