Hi all,
I am running a k8s Cluster that is being suspended every night, due to cost savings, as no one works at this time on the cluster. Unfortunately it seems like fielbeat is not really getting along with that.
Every morning when the cluster resumes, filebeats runs into a super high iowait
Total DISK READ : 45.83 M/s | Total DISK WRITE : 57.28 K/s
Actual DISK READ: 45.83 M/s | Actual DISK WRITE: 481.16 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
17018 be/4 root 14.91 M/s 0.00 B/s 0.00 % 99.99 % filebeat -c /etc/filebeat.yml -e
17014 be/4 root 11.21 M/s 0.00 B/s 0.00 % 99.99 % filebeat -c /etc/filebeat.yml -e
17013 be/4 root 9.45 M/s 0.00 B/s 0.00 % 99.99 % filebeat -c /etc/filebeat.yml -e
17073 be/4 root 8.22 M/s 0.00 B/s 0.00 % 98.93 % filebeat -c /etc/filebeat.yml -e
Which kind of basically kills the nodes.
The log file looks a followed - a kind of stucks at this point:
}
2021-02-18T10:40:34.919Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/etcd-0_sensu-system_backup-73237576d5eedf6350b50d5e391870e19a787aa81b588ff7086b463b458be4f3.log; Backoff now.
2021-02-18T10:40:35.001Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/sensu-backend-0_sensu-system_sensu-backend-f591eb9332122a48029ee9bdfda858912c7a999ce5a46845066bd89c947ff82a.log; Backoff now.
2021-02-18T10:40:35.088Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/gitlab-runner-rzneo-945fb5db9-j79qw_rzneo_gitlab-runner-5eda7e332f09638aeff884789e8beb28745aa88d77b3219a495423011831955f.log; Backoff now.
2021-02-18T10:40:34.982Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/csi-gce-pd-node-29jhk_gce-pd-csi-driver_csi-driver-registrar-36a2be3a7c4533fe82c6c32d9b8d009035b1d672178d0069de00c7ea63a246b3.log; Backoff now.
2021-02-18T10:40:34.827Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/metricbeat-924kz_elastic-system_metricbeat-7f93ebbf4f0263063c911fc3e1f6fc7c353c4fcbfa6129edfcb7d1eb76935701.log; Backoff now.
2021-02-18T10:40:34.897Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/podsigner-84797b6596-mtznn_rzneo_podsigner-debafb6b31028c0994f0e7e9ee6ab0e23466c612bef8cf7f123ed8cd79ed2507.log; Backoff now.
2021-02-18T10:40:34.958Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/etcd-0_sensu-system_etcd-151b63a866c39e74b9ae55603759814f76f145237244b1c38b764e756412cac2.log; Backoff now.
2021-02-18T10:40:35.096Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/vault-54b7c466cd-xxjnq_rzneo_vault-42acca5910c3685cbb4390d50f93a9faec281e3fc56324315bcac4fc703c8a15.log; Backoff now.
2021-02-18T10:40:34.923Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/consul-2_rzneo_backup-fddf2a80f458ff9c882d97cf961fdb1f5c0cd76eb97b03940d0422d800c38e42.log; Backoff now.
2021-02-18T10:40:35.034Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/prometheus-server-1_sensu-system_prometheus-server-configmap-reload-8b6964e1c8b84e1bdd17f7af17d89f4e4afe9d1ea1bf7791672a87961612757c.log; Backoff now.
2021-02-18T10:40:34.972Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/prometheus-server-1_sensu-system_prometheus-server-12c5b393aea38c5e2b6834f09f15f292d6084d16e8fb1f9c7721ddb812cc77cc.log; Backoff now.
2021-02-18T10:40:34.827Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/rzneo-postgres-db-2_rzneo_postgres-54fd47be8a1516eedc8f25b1fe1587bf8502ae88aec572c01e830a584fa790f4.log; Backoff now.
2021-02-18T10:40:34.958Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/prometheus-server-1_sensu-system_rules-receiver-91da7da17dabe477dd7e400213dac67bfd75ad472d8aa6e14f2bd5296199ef17.log; Backoff now.
2021-02-18T10:40:35.119Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/prometheus-kube-state-metrics-c65b87574-dzt4q_sensu-system_kube-state-metrics-6f56c10ef6821f22415baf44e3c0e445879f12d06fbb37e04c1927b9ea66f726.log; Backoff now.
2021-02-18T10:40:34.883Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/kube-metrics-adapter-6768b9ddb8-5bs5t_sensu-system_kube-metrics-adapter-030cffee12aa3ab77785d6d0ad4358065f2e9d2901f49057dc870cfc900b507e.log; Backoff now.
2021-02-18T10:40:35.177Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/elastic-es-masterdata-100-1_elastic-system_elasticsearch-94037a69325502cab16d78ad610c7c957cda2c105da2ae0ba1193b5520ac1ac9.log; Backoff now.
2021-02-18T10:40:34.917Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/kube-proxy-tm247_kube-system_kube-proxy-6cbca54fba7f7cd3c11d9887765e3ede46464460825aa7ae3018fadf40d92fa8.log; Backoff now.
2021-02-18T10:40:35.202Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/vault-54b7c466cd-xxjnq_rzneo_auto-unseal-b1e3578c56336c2f7efed10a5b87b6f60153225199710761043e57cac32d7856.log; Backoff now.
2021-02-18T10:40:35.202Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/sensu-backend-0_sensu-system_backup-fc09946c834effbcfdda93cb85fb9928662ebd56e07e8f869a0648498e345bc6.log; Backoff now.
2021-02-18T10:40:35.202Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/vault-token-service-7454ccd668-dkwjs_rzneo_vault-token-service-6ebae1fae717b0553c7ced08a96799fe6ee06ac229e6ca3e43e5f3dba033f835.log; Backoff now.
2021-02-18T10:40:35.210Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/cerebro-66bdf4ddbf-grs7v_elastic-system_cerebro-6cd3f0ca9ffcb604f1546e12ffb4b17d5e3fb56daa03049545f7729ca3dc0c1e.log; Backoff now.
2021-02-18T10:40:35.308Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/rzneo-postgres-db-2_rzneo_exporter-6145efb8632c65b6ee215097e0f2bb5e1c7807b788b324b533e1897cab3f3f0e.log; Backoff now.
2021-02-18T10:40:35.326Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/elastic-es-masterdata-100-1_elastic-system_elasticsearch-94037a69325502cab16d78ad610c7c957cda2c105da2ae0ba1193b5520ac1ac9.log; Backoff now.
2021-02-18T10:40:35.406Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/elastic-es-masterdata-100-1_elastic-system_elasticsearch-94037a69325502cab16d78ad610c7c957cda2c105da2ae0ba1193b5520ac1ac9.log; Backoff now.
2021-02-18T10:40:35.424Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/elastic-es-masterdata-100-1_elastic-system_elasticsearch-94037a69325502cab16d78ad610c7c957cda2c105da2ae0ba1193b5520ac1ac9.log; Backoff now.
2021-02-18T10:40:35.424Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/prometheus-node-exporter-b4ssb_sensu-system_prometheus-node-exporter-f3c3db858cfdc98ea3a2c01fb983fdd92dcf2b46ab7cd9f725af14b17e3eb9df.log; Backoff now.
2021-02-18T10:40:35.424Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/consul-2_rzneo_consul-692a39410c818a782395a047ec6719a363481c971f66087316ec6cc22599c7bb.log; Backoff now.
2021-02-18T10:40:35.504Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/rzneo-postgres-operator-5994567dcd-fhvbc_rzneo_postgres-operator-1dc7de20a664aa85f3c7dc70e6e32e51d21ec8f1b9b47d1bb37bedfc0d19eeba.log; Backoff now.
2021-02-18T10:40:35.564Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/grafana-86d566c885-z4mrm_sensu-system_grafana-bd4d6f8be158db308ab7836fa91804e446fb4cc2cc18b85e42f6f7c73b8b85e5.log; Backoff now.
2021-02-18T10:40:35.572Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/elastic-es-masterdata-100-1_elastic-system_elasticsearch-94037a69325502cab16d78ad610c7c957cda2c105da2ae0ba1193b5520ac1ac9.log; Backoff now.
2021-02-18T10:40:35.585Z DEBUG [harvester] log/log.go:107 End of file reached: /var/log/containers/csi-gce-pd-node-29jhk_gce-pd-csi-driver_gce-pd-driver-18a1e8ca350a5e72790e79a2c50e962bc04ee68ffe061e8b8e6fbcfe5dc1cf41.log; Backoff now.
2021-02-18T10:40:37.601Z DEBUG [reader_multiline] multiline/pattern.go:170 Multiline event flushed because timeout reached.
2021-02-18T10:41:15.970Z DEBUG [input] input/input.go:139 Run input
Uninstalling and reinstalling filebeat fixes the issue.
However, this is not a long term solution.
filebeat test output
sh-4.2# filebeat test output -c /etc/filebeat.yml
elasticsearch: https://elastic-es-http.elastic-system.svc:9200...
parse url... OK
connection...
parse host... OK
dns lookup... OK
addresses: 10.101.39.28
dial up... OK
TLS...
security... WARN server's certificate chain verification is disabled
handshake... OK
TLS version: TLSv1.3
dial up... OK
talk to server... OK
version: 7.11.0
I would be very happy is someone could help us with that.