Hi!
I have several Filebeat containers (one filebeat container per host) in my infrastructure sending logs to three Logstash containers, located on one instance each. Filebeat is configured to Load balance between those, with publish_async: true.
The problem I'm facing is that I get those errors, alot:
10:45:41.176206 sync.go:85: ERR Failed to publish events caused by: read tcp 172.17.0.2:59030->52.50.63.141:5000: read: connection reset by peer
10:45:41.176247 sync_worker.go:167: INFO Error publishing events (retrying): read tcp 172.17.0.2:59030->52.50.63.141:5000: read: connection reset by peer
110:45:41.172658 sync.go:85: ERR Failed to publish events caused by: write tcp 172.17.0.2:60410->52.213.93.139:5000: write: connection reset by peer
10:45:41.172724 sync_worker.go:167: INFO Error publishing events (retrying): write tcp 172.17.0.2:60410->52.213.93.139:5000: write: connection reset by peer
I don't seem to loose any logs since Filebeat establishes connection after a while.
However, if i'm disabling Load balancing on Filebeat, the problem seems to go away. But, I would like to use Load balancing.
I run version 5.0.1 on Filebeat with this conf:
filebeat.prospectors:
- input_type: log
document_type: syslogMessages
scan_frequency: 5s
close_inactive: 1m
backoff_factor: 1
backoff: 1s
paths:
- /host/var/log/messages
- input_type: log
document_type: syslogSecure
scan_frequency: 5s
backoff_factor: 1
close_inactive: 1m
backoff: 1s
paths:
- /host/var/log/secure
- input_type: log
document_type: ecsAgent
scan_frequency: 5s
backoff_factor: 1
close_inactive: 10s
backoff: 1s
paths:
- /host/var/log/ecs/ecs-agent.log.*
- input_type: log
document_type: docker
scan_frequency: 1s
close_inactive: 10m
backoff_factor: 1
backoff: 1s
json.message_key: log
json.keys_under_root: true
json.add_error_key: true
overwrite_keys: true
paths:
- /host/var/lib/docker/containers/*/*.log
multiline.pattern: '^[[:space:]]+|^Caused by:'
multiline.negate: false
multiline.match: after
multiline.timeout: 1s
#================================ General =====================================
publish_async: true
filebeat.idle_timeout: 1s
filebeat.shutdown_timeout: 5s
fields_under_root: true
fields:
accountId: ${ACCOUNTID}
instanceId: ${INSTANCEID}
instanceName: ${INSTANCENAME}
region: ${REGION}
az: ${AZ}
environment: ${ENV}
logging.metrics.enabled: true
metrics.period: 60s
logging.level: info
#================================ Outputs =====================================
#----------------------------- Logstash output --------------------------------
output.logstash:
hosts: ["indexer01:5000", "indexer02:5000", "indexer03:5000"]
compression_level: 1
worker: 2
loadbalance: true
ssl.certificate_authorities: ["/host/opt/filebeat/logstash.pem"]
max_retries: -1
Logstash is version 5.1.2 with this input/output conf:
input {
beats {
port => "5000"
ssl => "true"
ssl_certificate => "/host/opt/logstash/logstash.pem"
ssl_key => "/host/opt/logstash/logstash.key"
client_inactivity_timeout => "900"
}
}
output {
##If type ECS, Send to ES and PT. Sends to PT via rsyslog on host to get SSL.
##Uses ecs cluster name+container name in PT
if [type] == "ecs" {
syslog {
facility => "local0"
severity => "notice"
host => "172.17.0.1"
port => "514"
appname => "%{ecsContainerName}"
sourcehost => "%{ecsCluster}"
protocol => "tcp"
}
elasticsearch {
hosts => ["https://${ESENDPOINT}:443"]
ssl => "true"
manage_template => false
index => "my-ecs-logs-%{+YYYY.MM.dd}"
}
......Several more outputs to the same ES cluster below, but to other index.
}
The beat plugin is version: 3.1.12
The Logstash containers runs on three c4.large with 16 workers each in AWS and is sending the logs to AWS ES and Papertrail(via rsyslog on the host to get SSL). I get no errors in my Logstash logs, however, I have not ran a debugging on them. There are no IP tables or similar configured on the Logstash host/containers.
You can find debug logs from Filebeat here: http://pastebin.com/NSNcwtS8