Pod container logs stop randomly

Since upgrading to 8.6.1 (from 8.5.x), I'm finding that logs that are supposed to be collected via the Kubernetes Integration for an elastic agent in Fleet mode are stopping completely and randomly (as far as I can see).

The logs stop for different pods at different times, and when they do stop, they start again after a few days, only to stop again after a day or two. Attached is a screenshot of logs from a particular pod.

These events don't seem to correlate with any logs throughout the system, nor do they appear to occur at any time when changes are made, or upgrades. They do not correlate with pods deleting and creating nor when logs are rotated, so it's a real mystery to me at this point.

I should also mention that I am observing similar behaviour with a Custom Logs integration which I originally thought had to do with leader election - and so I'm wondering if this has something to do with filebeat.

Any help in starting to get to the bottom of this is greatly appreciated, otherwise using the Elastic Stack is sortof useless :smiley:

1 Like

This sounds similar but not identically to the topics blow. Might be related somehow:

Does restarting the elastic-agents solve the issue? Does downgrading the agents to 8.5.3 solve the issue?

2 Likes

Interesting, thank you for linking those.

I'll do some tests again, but I've not really managed to find a pattern as to when and why the logs are stopping. Some of them don't appear to be related to when pods recreate or restart. It's odd.

I see there is another related issue: K8s Integration does not report correct container.id when container restarts · Issue #5348 · elastic/integrations · GitHub

Both issues are v8.6, so I'm wondering if there is a bunch of regressions all related to the same thing that have been introduced, and we're all experiencing subtly different results.

To add: one thing is for certain: I've been unable to find any output from the agents, beats or other elastic components which indicate what is going wrong - so it must be a silent failure somewhere.

Hello! I think we are experiencing the same problem. Elastic Agent 8.6.2 in Fleet mode using Kubernetes integration is not collecting some logs from pods or sometimes not collecting logs from specific pods at all, even after changes in configuration and agent restarts. The logs of Elastic agents didn't show anything special that could explain this strange behaviour. Standard Filebeat seems to collect all the logs with no problems.

This appears to be resolved in v8.7.0.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.