Sometimes a container or a series of containers becomes unresponsive when Elastic server is not available so we can't: restart/stop/kill these containers. Docker logs command is not available for these containers. Docker stats is not available for all containers.
Last time we were able to reproduce the issue when there was no route no server via network while containers were running.
So here's an error message:
dockerd[64602]: time="2021-04-26T15:48:51+03:00" level=error msg="2021-04-26T12:48:51.645Z\tERROR\t[publisher_pipeline_output]\tpipeline/output.go:155\tFailed to connect to backoff(elasticsearch(http://...)): Get http://...: dial tcp ...: connect: no route to host
I can see that it tries to reconnect:
dockerd[64602]: time="2021-04-26T15:48:16+03:00" level=error msg="2021-04-26T12:48:16.915Z\tINFO\t[publisher_pipeline_output]\tpipeline/output.go:146\tAttempting to reconnect to backoff(elasticsearch
...)) with 89 reconnect attempt(s)" plugin=6c0cba97835793e6fa2056e601bee296d996ab03c23cd0076b5baaa7278dcd53
Apr 26 15:48:16 ... dockerd[64602]: time="2021-04-26T15:48:16+03:00" level=error msg="2021-04-26T12:48:16.915Z\tINFO\t[PipelineManager]\tpipeline/retry.go:221\tretryer: send unwait signal to consumer" plugin=6c0cba97835
793e6fa2056e601bee296d996ab03c23cd0076b5baaa7278dcd53
Apr 26 15:48:16 .... dockerd[64602]: time="2021-04-26T15:48:16+03:00" level=error msg="2021-04-26T12:48:16.915Z\tINFO\t[PipelineManager]\tpipeline/retry.go:225\t done" plugin=6c0cba97835793e6fa2056e601bee296d996ab03c23c
d0076b5baaa7278dcd53
If i search logs for a container that's what i see:
Apr 26 15:02:20 ... dockerd[64602]: time="2021-04-26T15:02:20.808393367+03:00" level=info msg="Configured log driver does not support reads, enabling local file cache for container logs" container=095f572f6fd0233154210ba9f39b4073f44ab101431aa1fa22f711a981d242ef driver="elastic/elastic-logging-plugin:7.8.1"
Apr 26 15:02:20 ... containerd[977]: time="2021-04-26T15:02:20.836300244+03:00" level=info msg="starting signal loop" namespace=moby path=/run/containerd/io.containerd.runtime.v2.task/moby/095f572f6fd0233154210ba9f39b4073f44ab101431aa1fa22f711a981d242ef pid=41008
When i tried to stop a container:
Apr 26 15:26:19 ... dockerd[64602]: time="2021-04-26T15:26:19.797138343+03:00" level=info msg="Container 095f572f6fd0233154210ba9f39b4073f44ab101431aa1fa22f711a981d242ef failed to exit within 10 seconds of signal 15 - u
sing the force"
Apr 26 15:26:21 .... dockerd[64602]: time="2021-04-26T15:26:21.954812808+03:00" level=info msg="ignoring event" container=095f572f6fd0233154210ba9f39b4073f44ab101431aa1fa22f711a981d242ef module=libcontainerd namespace=mo
by topic=/tasks/delete type="*events.TaskDelete"
Apr 26 15:26:21 ... containerd[977]: time="2021-04-26T15:26:21.955558303+03:00" level=info msg="shim disconnected" id=095f572f6fd0233154210ba9f39b4073f44ab101431aa1fa22f711a981d242ef
Versions:
Docker version 20.10.1
elastic-logging-plugin:7.8.1