I have been running a logstash (docker) and filebeat (Windows 10) combination for about 5 months, and recently I've started seeing some "i/o timeout" errors on the filebeat side. One particular site (of a total of 10 sites) throws these errors almost constantly and doesn't really function -- the rest only throw them occasionally, and are not impacted. When I start filebeat at this site, it makes the CPU usage in logstash jump to an average of about 60% (with 100% peaks), whereas the rest of the time the usage is about 4-7%.
In an attempt to alleviate this problem, I updated both sides to 7.5.2. This did not change the behavior. I am running multi-pipeline, with some files going to AWS S3 and some going to an AWS SQS queue.
To debug, I tried running both the docker-based logstash instance and my filebeat instance on my PC. This setup is showing the problem every single time, with no obvious cause (no obvious errors in either debug output). My cleint_inactivity_timeout => 3600, but the errors occur in nowhere near that amount of time. Here's some sample log output from both logstash and filebeat:
Filebeat:
2020-02-10T09:59:44.145-0500 DEBUG [input] log/input.go:511 Update existing file for harvesting: C:\Isd_Appliance\Reports\tls-report-2020-01-08-092525.txt, offset: 369032
2020-02-10T09:59:44.145-0500 DEBUG [input] log/input.go:563 Harvester for file is still running: C:\Isd_Appliance\Reports\tls-report-2020-01-08-092525.txt
2020-02-10T09:59:44.145-0500 DEBUG [input] log/input.go:212 input states cleaned up. Before: 10, After: 10, Pending: 0
2020-02-10T09:59:45.042-0500 DEBUG [transport] transport/client.go:218 handle error: read tcp [::1]:65471->[::1]:5046: i/o timeout
2020-02-10T09:59:45.042-0500 ERROR logstash/async.go:256 Failed to publish events caused by: read tcp [::1]:65471->[::1]:5046: i/o timeout
2020-02-10T09:59:45.044-0500 DEBUG [transport] transport/client.go:131 closing
2020-02-10T09:59:45.061-0500 DEBUG [logstash] logstash/async.go:159 488 events out of 488 events sent to logstash host localhost:5046. Continue sending
2020-02-10T09:59:45.061-0500 DEBUG [logstash] logstash/async.go:116 close connection
2020-02-10T09:59:45.062-0500 ERROR logstash/async.go:256 Failed to publish events caused by: client is not connected
2020-02-10T09:59:45.062-0500 DEBUG [logstash] logstash/async.go:116 close connection
2020-02-10T09:59:46.323-0500 ERROR pipeline/output.go:121 Failed to publish events: client is not connected
2020-02-10T09:59:46.323-0500 INFO pipeline/output.go:95 Connecting to backoff(async(tcp://localhost:5046))
2020-02-10T09:59:46.323-0500 DEBUG [logstash] logstash/async.go:111 connect
2020-02-10T09:59:46.325-0500 INFO pipeline/output.go:105 Connection to backoff(async(tcp://localhost:5046)) established
2020-02-10T09:59:46.334-0500 DEBUG [logstash] logstash/async.go:159 488 events out of 488 events sent to logstash host localhost:5046. Continue sending
Logstash:
vrdplat-logstash_1 | [2020-02-10T14:59:46,706][DEBUG][org.logstash.beats.BeatsHandler] [local: 172.18.0.2:5046, remote: 172.18.0.1:53520] Sending a new message for the listener, sequence: 487
vrdplat-logstash_1 | [2020-02-10T14:59:46,706][DEBUG][org.logstash.beats.BeatsHandler] [local: 172.18.0.2:5046, remote: 172.18.0.1:53520] Sending a new message for the listener, sequence: 487
vrdplat-logstash_1 | [2020-02-10T14:59:46,707][DEBUG][org.logstash.beats.BeatsHandler] [local: 172.18.0.2:5046, remote: 172.18.0.1:53520] Sending a new message for the listener, sequence: 488
vrdplat-logstash_1 | [2020-02-10T14:59:46,708][TRACE][org.logstash.beats.BeatsHandler] [local: 172.18.0.2:5046, remote: 172.18.0.1:53520] Acking message number 488
vrdplat-logstash_1 | [2020-02-10T14:59:46,708][DEBUG][org.logstash.beats.BeatsHandler] 16d23b87: batches pending: false
vrdplat-logstash_1 | [2020-02-10T14:59:47,381][DEBUG][logstash.outputs.s3 ] Periodic check for stale files
vrdplat-logstash_1 | [2020-02-10T14:59:47,384][DEBUG][org.logstash.execution.PeriodicFlush] Pushing flush onto pipeline.
vrdplat-logstash_1 | [2020-02-10T14:59:47,452][DEBUG][org.logstash.execution.PeriodicFlush] Pushing flush onto pipeline.
vrdplat-logstash_1 | [2020-02-10T14:59:49,676][TRACE][org.logstash.beats.BeatsHandler] [local: 172.18.0.2:5046, remote: 172.18.0.1:53520] Channel Inactive
vrdplat-logstash_1 | [2020-02-10T14:59:50,774][DEBUG][org.logstash.execution.PeriodicFlush] Pushing flush onto pipeline.
vrdplat-logstash_1 | [2020-02-10T14:59:51,546][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
vrdplat-logstash_1 | [2020-02-10T14:59:51,546][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
vrdplat-logstash_1 | [2020-02-10T14:59:52,384][DEBUG][org.logstash.execution.PeriodicFlush] Pushing flush onto pipeline.vrdplat-logstash_1 | [2020-02-10T14:59:46,707][DEBUG]
Here are excerpts of the filebeat.yml and logstash.conf:
filebeat.yml:
filebeat.inputs:
# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- C:\Isd_Appliance\RawDataFiles\*\*.csv
tags: ["send_to_sqs"]
fields:
pipeline: main
- type: log
# Change to true to enable this input configuration.
enabled: true
# Paths that should be crawled and fetched. Glob based paths.
paths:
- C:\Isd_Appliance\Log\*.log
fields:
pipeline: main
- type: log
enabled: true
paths:
- C:\Isd_Appliance\Reports\*.txt
tags: ["send_to_s3"]
fields:
pipeline: ordered
logstash.conf:
input {
beats {
port => 5046
type => filebeat
client_inactivity_timeout => 3600
ssl => false
}
http {
port => 8000
type => "elb-healthcheck"
}
tcp {
port => 1514
type => syslog
}
udp {
port => 1514
type => syslog
}
}
output {
if [fields][pipeline] == "ordered" {
pipeline { send_to => ordered }
} else {
pipeline { send_to => main }
}
}