Using filebeat 7.4.2 and stack monitoring we see where the throughput of filebeat drops to zero and the only abnormal log message is:
2019-12-20T03:47:31.711Z INFO [s3] s3/input.go:293 Message visibility timeout updated to 300
2019-12-20T03:47:39.581Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":5770,"time":{"ms":16}},"total":{"ticks":108770,"time":{"ms":28},"value":108770},"user":{"ticks":103000,"time":{"ms":12}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":8},"info":{"ephemeral_id":"e868953c-5e67-4695-ac95-0ffd16ed8822","uptime":{"ms":8280035}},"memstats":{"gc_next":42940960,"memory_alloc":22161072,"memory_total":16862697496},"runtime":{"goroutines":40}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.28,"15":0.84,"5":0.63,"norm":{"1":0.14,"15":0.42,"5":0.315}}}}}}
2019-12-20T03:48:09.581Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":5780,"time":{"ms":8}},"total":{"ticks":108780,"time":{"ms":12},"value":108780},"user":{"ticks":103000,"time":{"ms":4}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":7},"info":{"ephemeral_id":"e868953c-5e67-4695-ac95-0ffd16ed8822","uptime":{"ms":8310035}},"memstats":{"gc_next":42940960,"memory_alloc":23817144,"memory_total":16864353568},"runtime":{"goroutines":38}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.17,"15":0.81,"5":0.57,"norm":{"1":0.085,"15":0.405,"5":0.285}}}}}}
The Message visibility timeout updated to 300
log line does not appear in our other filebeat instances that are running correctly processing the same s3 bucket. This seemingly happens random from what I can tell and a simple restart of the process will get it processing events again.
Configuration:
filebeat.inputs:
- type: s3
queue_url: https://sqs.us-east-1.amazonaws.com/xxxxxxx/queue
processors:
- decode_json_fields:
fields: ["message"]
process_array: true
max_depth: 1
target: ""
overwrite_keys: true
- rename:
fields:
- from: "log_level"
to: "level"
ignore_missing: true
processors:
- drop_fields:
fields: ["agent.ephemeral_id", "agent.hostname", "agent.id", "agent.type", "agent.version", "host.name", "ecs.version", "input.type", "log.file.path", "log.offset", "stream", "suricata.eve.timestamp", "time", "timestamp", "kubernetes.container.image", "kubernetes.labels.pod-template-hash", "container.labels", "container.name", "container.image", "kubernetes.labels.ci_branch", "kubernetes.labels.ci_build_number", "kubernetes.labels.ci_dep_build_number", "kubernetes.labels.ci_dep_user", "kubernetes.labels.ci_user", "kubernetes.labels.generator", "kubernetes.labels.updated", "aws.s3.bucket.arn", "log.file.path"]
ignore_missing: true
cloud.id: "xxxxxxxxxx"
cloud.auth: "xxxxxxxxx"
output.elasticsearch:
compression_level: 5
worker: 10
bulk_max_size: 4096
monitoring:
enabled: true
cluster_uuid: xxxxxx
elasticsearch:
hosts: ["https://xxxxxxxxx.us-east-1.aws.found.io:9243"]
username: xxxxx
password: xxxxx
Edit: Was able to find one instance that finally did output an error:
2019-12-20T12:03:09.533Z ERROR [s3] s3/input.go:291 change message visibility failed: InvalidParameterValue: Value AQEBK/v0lshsHIKTA2NI3HPr2EgFD8LQ73QAIalGrhp8q+l2LGJBxjbLivyVuytp8hbdXn+hXGp+b7WeBQEZiiKF3arYK/CToGiDN+ZgDBosW+YJhHetQu8pU4/rSkhpunsP1ywG1RvZ4O8IoKhqqjOHYsPZul2+ntjDNpOf5u16MA6ZbNUY0+ZDaxH6V61XcrYKsrUuIGBTMqp3EmRkROWgacE6XYYNTCuP0m1LfvRsAMHvKAUPu5O4fiyA1Wf+61EGVP6FDOc9+4Zk/rRt/IZh/y4YLMW3l7NyUFdVt6hjwovevHfEzlI9POWshBVZ3brrmHfaq08Lga1/M/auTM4ENImVH9H7cCQr36a/tlqJTemJwG8pCSJNzye8fZ2oO13UL6zGB+BaBDR4YcoNA+HRlrjBDegIBkDPkvqcM6Afk7g= for parameter ReceiptHandle is invalid. Reason: The receipt handle has expired.
status code: 400, request id: 12500646-eb3b-5276-a0a3-503051863927
2019-12-20T12:03:09.533Z INFO [s3] s3/input.go:293 Message visibility timeout updated to 300
2019-12-20T12:03:23.548Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":354180},"total":{"ticks":1403390,"time":{"ms":12},"value":1403390},"user":{"ticks":1049210,"time":{"ms":12}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":7},"info":{"ephemeral_id":"3f765775-ef09-448b-8c66-c6315cee5030","uptime":{"ms":1974180053}},"memstats":{"gc_next":10904960,"memory_alloc":6704520,"memory_total":118881164656,"rss":-81920},"runtime":{"goroutines":22}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.14,"15":0.36,"5":0.29,"norm":{"1":0.07,"15":0.18,"5":0.145}}}}}}
2019-12-20T12:03:53.548Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":354180,"time":{"ms":4}},"total":{"ticks":1403400,"time":{"ms":12},"value":1403400},"user":{"ticks":1049220,"time":{"ms":8}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":6},"info":{"ephemeral_id":"3f765775-ef09-448b-8c66-c6315cee5030","uptime":{"ms":1974210054}},"memstats":{"gc_next":10885360,"memory_alloc":5532072,"memory_total":118881547792},"runtime":{"goroutines":20}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.3,"15":0.37,"5":0.31,"norm":{"1":0.15,"15":0.185,"5":0.155}}}}}}
2019-12-20T12:04:23.549Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":354180},"total":{"ticks":1403400,"time":{"ms":4},"value":1403400},"user":{"ticks":1049220,"time":{"ms":4}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":6},"info":{"ephemeral_id":"3f765775-ef09-448b-8c66-c6315cee5030","uptime":{"ms":1974240054}},"memstats":{"gc_next":10885360,"memory_alloc":5864024,"memory_total":118881879744},"runtime":{"goroutines":20}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.18,"15":0.35,"5":0.28,"norm":{"1":0.09,"15":0.175,"5":0.14}}}}}}
2019-12-20T12:04:53.548Z INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":354180},"total":{"ticks":1403410,"time":{"ms":4},"value":1403410},"user":{"ticks":1049230,"time":{"ms":4}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":6},"info":{"ephemeral_id":"3f765775-ef09-448b-8c66-c6315cee5030","uptime":{"ms":1974270053}},"memstats":{"gc_next":10885360,"memory_alloc":6078752,"memory_total":118882094472},"runtime":{"goroutines":20}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0}},"pipeline":{"clients":1,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"load":{"1":0.44,"15":0.37,"5":0.33,"norm":{"1":0.22,"15":0.185,"5":0.165}}}}}}