Duplicate messages when using custom document_id

andyshinn · February 12, 2018, 8:26pm

I am starting to use document_id in my elasticsearch output to help with accidental duplicate log messages. We have a uuid field that comes from our application which we are trying to use as the document_id. I also try and set the field if not already using the uuid filter. Here is an example of our Logstash configuration:

input {
    beats {
        host => "0.0.0.0"
        port => 5044
        codec => "json"
    }
}

filter {
    uuid {
        target    => "uuid"
        overwrite => false
    }
}

output {
    if [logger] in ["gunicorn.access", "gunicorn.error", "findmine.frontend"] {
        file {
            path => "/var/log/frontend-private-%{+YYYY-MM-dd}.log.gz"
            codec => json_lines
            gzip => true
        }
    } else {
        file {
            path => "/var/log/misc.log"
            codec => json_lines
        }
    }

    elasticsearch {
        hosts => ["10.10.10.10"]
        document_type => "logs"
        document_id   => "%{[uuid]}"
    }
}

The Filebeat configuration on hosts:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/app/*.log

output.logstash:
  hosts: ["10.10.10.20:5044"]

What happens in this case is that I end up with duplicate messages in Elasticsearch. One of the messages has the correct uuid field as the _id and the other seems to have the Elasticsearch generated _id.

What am I doing wrong here? I just want the messages which come to Logstash having the uuid field to use it as the document_id but to generate the uuid if it is otherwise empty.

Logstash 5.6.3
Filebeat 6.1.1
Elasticsearch 6.1.1

system · March 12, 2018, 8:26pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash adding duplicate rows for every run Logstash	11	14776	July 6, 2017
Duplicate documents in Elasticsearch? Elasticsearch	6	2895	October 1, 2017
Duplicate _id docs and docs with multiple _id values - how is this possible? Logstash	3	1684	April 3, 2019
ES query to check the existence of a document_id? Logstash	10	1089	June 26, 2020
Duplicate logs Elasticsearch	14	6888	July 10, 2018

Duplicate messages when using custom document_id

Related topics