We have four separate kubernetes clusters, two on-prem and two in AWS setup with filebeat to ship logs. For several months we have been troubleshooting an issue where sporadically our filebeat daemonsets will altogether stop shipping logs. We were on filebeat 6.2.x when this started, and ended up upgrading to 6.8.1 hoping this would help.
Often filebeat will continue shipping logs for an extended period of time, then suddenly will stop for several nodes. If we restart the filebeat daemonset it will immediately start shipping logs again, including the logs that were previously missed. At this point I am not sure what else to look at.
Memory/Cpu consumption seems to be normal during these outages.
System Version and component information:
Kubernetes 1.12.4
Filebeat 6.8.1
CNI: Kube-router
DNS provider: CoreDNS
RBAC: Enabled
Some errors we have seen:
ERROR kubernetes/watcher.go:258 kubernetes: Watching API error read tcp x.x.x.x:59506->x.x.x.x:443: read: connection timed out
ERROR kubernetes/watcher.go:248 kubernetes: Watching API error Get https://x.x.x.x:443/api/v1/pods?fieldSelector=spec.nodeName%3Dlg-l-p-obo00500&resourceVersion=&watch=true: dial tcp x.x.x.x:443: i/o timeout
ERROR kubernetes/watcher.go:258 kubernetes: Watching API error EOF
ERROR kubernetes/watcher.go:258 kubernetes: Watching API error read tcp x.x.x.x.184:39540->x.x.x.x:443: read: connection reset by peer
ERROR log/harvester.go:282 Read line error: invalid CRI log format; File: /var/lib/docker/containers/10e45645029f95adfdb1cee0c6341757e86d3c3115472d8076dc410fcb17eb30/10e45645029f95adfdb1cee0c6341757e86d3c3115472d8076dc410fcb17eb30-json.log
ERROR log/harvester.go:282 Read line error: invalid CRI log format; File: /var/lib/docker/containers/10e45645029f95adfdb1cee0c6341757e86d3c3115472d8076dc410fcb17eb30/10e45645029f95adfdb1cee0c6341757e86d3c3115472d8076dc410fcb17eb30-json.log
However the only error that reliably shows up on ALL nodes when the error occurs is this:
ERROR kubernetes/watcher.go:258 kubernetes: Watching API error EOF
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.