I have Filebeat configured with Loadbalance: True
with multiple Logstash servers in the Logstash Output. Everything works fine, I see a nice distribution across all of the Logstash instances.
To prevent duplicates I have followed this Elastic blog post, specifically by bringing our own ID. (We use a generated UUID from the logs).
The problem is now I'm facing a whole bunch of 409 errors (version conflict, document already exists
). On one hand this is great as Logstash is doing it's job and preventing duplicate data in ES, however, I want to know why Logstash is processing these events more than once and why it thinks we have duplicated data.
I've been able to confirm this is almost certainly the case that Filebeat is re-sending the log lines to multiple Logstash hosts. I am tagging the Logstash hostname in the logs in ES and can confirm that a single event was ingested by Logstash host 05 and we saw a 409 15 seconds later for the same _id
on my Logstash host 01. This pattern is repeated multiple times with different Logstash hosts.
I have confirmed in the actual log on the server the UUID only exists once so it is 100% not a logging issue and we do not have duplicated UUIDs in the logs.
What action can I take to prevent Filebeat from sending the same log line to multiple Logstash hosts? I feel like I've followed best practices when setting up, am I maybe missing something?
Thanks
Redacted Filebeat Config:
#=========================== Filebeat inputs =============================
filebeat.inputs:
[{'type': 'filestream', 'enabled': True, 'paths': ['/path/to/logs/myfile.log'], 'fields_under_root': True, 'fields': {'log_type': 'payload'}}]
# ========================== Filebeat global options ===========================
filebeat.registry.path: ${path.data}/registry
filebeat.registry.file_permissions: 0600
filebeat.registry.flush: 0s
filebeat.shutdown_timeout: 5
# ================================== General ===================================
name: myhost.hostname
tags: []
fields:
awsvpc: test
fields_under_root: True
queue:
mem:
events: 4096
flush.min_events: 2048
flush.timeout: 1s
# ================================= Processors =================================
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
# ================================== Outputs ===================================
output.logstash:
enabled: True
hosts: ['logstash01.my.domain:5044', 'logstash02.my.domain:5044', 'logstash03.my.domain:5044', 'logstash04.my.domain:5044', 'logstash05.my.domain:5044', 'logstash06.my.domain:5044']
worker: 1
loadbalance: true
ssl.certificate_authorities: ["/usr/local/share/ca-certificates/intca.crt"]
ssl.certificate: "/etc/pki/filebeat.crt"
ssl.key: "/etc/pki/filebeat.key"
ssl.supported_protocols: "TLSv1.2"
ssl.verification_mode: "strict"
# ================================== Logging ===================================
logging.level: info
logging.selectors: []
logging.metrics.enabled: False
logging.to_files: true
logging.files:
path: /path/to/logs/
name: filebeat.log
rotateeverybytes: 52428800
keepfiles: 7
permissions: 0600
interval: 24h
rotateonstartup: false
Redacted Logstash error:
Failed action {:status=>409, :action=>["create", {:_id=>"xxx-xxx-xxx-xxx-xxx", :_index=>"my-logs", :routing=>nil}, {REDACTED}], :response=>{"create"=>{"_index"=>".ds-my-logs", "_type"=>"_doc", "_id"=>"xxx-xxx-xxx-xxx-xxx", "status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[xxx-xxx-xxx-xxx-xxx]: version conflict, document already exists (current version [1])", "index"=>".ds-my-logs", "shard"=>"1", "index_uuid"=>"x-XXXX"}}}}