I am starting to use document_id
in my elasticsearch
output to help with accidental duplicate log messages. We have a uuid
field that comes from our application which we are trying to use as the document_id
. I also try and set the field if not already using the uuid
filter. Here is an example of our Logstash configuration:
input {
beats {
host => "0.0.0.0"
port => 5044
codec => "json"
}
}
filter {
uuid {
target => "uuid"
overwrite => false
}
}
output {
if [logger] in ["gunicorn.access", "gunicorn.error", "findmine.frontend"] {
file {
path => "/var/log/frontend-private-%{+YYYY-MM-dd}.log.gz"
codec => json_lines
gzip => true
}
} else {
file {
path => "/var/log/misc.log"
codec => json_lines
}
}
elasticsearch {
hosts => ["10.10.10.10"]
document_type => "logs"
document_id => "%{[uuid]}"
}
}
The Filebeat configuration on hosts:
filebeat.prospectors:
- input_type: log
paths:
- /var/log/app/*.log
output.logstash:
hosts: ["10.10.10.20:5044"]
What happens in this case is that I end up with duplicate messages in Elasticsearch. One of the messages has the correct uuid
field as the _id
and the other seems to have the Elasticsearch generated _id
.
What am I doing wrong here? I just want the messages which come to Logstash having the uuid
field to use it as the document_id
but to generate the uuid
if it is otherwise empty.
Logstash 5.6.3
Filebeat 6.1.1
Elasticsearch 6.1.1