Pod container logs stop randomly

woodywoodsta · February 15, 2023, 2:54pm

Since upgrading to 8.6.1 (from 8.5.x), I'm finding that logs that are supposed to be collected via the Kubernetes Integration for an elastic agent in Fleet mode are stopping completely and randomly (as far as I can see).

The logs stop for different pods at different times, and when they do stop, they start again after a few days, only to stop again after a day or two. Attached is a screenshot of logs from a particular pod.

These events don't seem to correlate with any logs throughout the system, nor do they appear to occur at any time when changes are made, or upgrades. They do not correlate with pods deleting and creating nor when logs are rotated, so it's a real mystery to me at this point.

I should also mention that I am observing similar behaviour with a Custom Logs integration which I originally thought had to do with leader election - and so I'm wondering if this has something to do with filebeat.

Any help in starting to get to the bottom of this is greatly appreciated, otherwise using the Elastic Stack is sortof useless

clewo · February 16, 2023, 10:51am

This sounds similar but not identically to the topics blow. Might be related somehow:

github.com/elastic/elastic-agent

Elastic Agent 8.6.x standalone deployment in Kubernetes doesn't start monitoring new pods until agent restart

opened 01:53PM - 14 Feb 23 UTC

closed 09:15PM - 10 Mar 23 UTC

renzedj

bug Team:Elastic-Agent Team:Cloudnative-Monitoring 8.7-candidate v8.6.0

Since upgrading to Elastic Agent v8.6.x, I've started using condition-based auto…discovery for Kubernetes pods in Elastic Agent. When pod logs rotate, they use a `copytruncate` strategy. Elastic Agent continues ingesting the logs as expected when this happens. However, when a new pod starts, Elastic Agent does not appear to start monitoring until the agent is restarted. This includes when: * A new deployment happens on a node for the first time (i.e., no prior instances of that deployment were running). * A pod restarts on the same node. * A pod stops on one node and a new instance starts on another node (i.e., when a pod "switches nodes"). This has been observed with Elastic Agent v8.6.0 and v8.6.1 in EKS on Kubernetes 1.22.14. I have also tested this with `filebeat-8.6.1` deployed to my cluster using the older-style `filebeat.autodiscover.providers` hints-based autodiscovery, and it **does not** appear to exhibit the same behavior. I reported this issue in the [forums][1] and received a response from another user who had [reported the same issue][2] in the forums about the same time I did. Per the other user's recommendation, I downgraded Elastic Agent to v8.5.3, which appears to have resolved the issue. To reproduce: * Deploy Elastic Agent Standalone 8.6.0 or 8.6.1 with a condition-based filestream input. * Create a new deployment with a container that matches the filestream input configuration **after** Elastic Agent is running and logging has started. Here is a sample input configuration, taken from a pod for which this is particularly noticeable, since it generally terminates after about two hours and a new instance is scheduled to another node: ```yaml --- inputs: - id: 'filestream-myapp-0d9ea285-223f-4b6a-9207-9432b0eac168' name: 'filestream-myapp' type: 'filestream' use_output: 'default' data_stream: namespace: 'default' streams: # mycontainer - id: 'logs-${kubernetes.pod.name}-${kubernetes.container.id}' condition: '${kubernetes.container.name} == "mycontainer"' data_stream: dataset: 'myapp.log' type: 'logs' parsers: - container: stream: 'all' format: 'auto' - ndjson: expand_keys: true ignore_decoding_error: true overwrite_keys: true target: '' paths: ['/var/log/containers/*${kubernetes.container.id}.log'] pipeline: 'logs-myapp.log' processors: - add_locale: format: 'offset' prospector.scanner.symlinks: true tags: ['myapp'] ``` I do **not** currently have `providers.kubernetes.hints.enabled: true` set (the documentation for conditions-based autodiscovery does not require it), but I have tested it both ways with the same result. [1]:https://discuss.elastic.co/t/elastic-agent-conditions-based-autodiscover-doesnt-pick-up-newly-scheduled-pods-containers [2]:https://discuss.elastic.co/t/elastic-agent-autodiscovery-for-kubernetes-pod-logs-not-working-after-upgrading-to-8-6-x

Does restarting the elastic-agents solve the issue? Does downgrading the agents to 8.5.3 solve the issue?

woodywoodsta · February 23, 2023, 12:13pm

Interesting, thank you for linking those.

I'll do some tests again, but I've not really managed to find a pattern as to when and why the logs are stopping. Some of them don't appear to be related to when pods recreate or restart. It's odd.

I see there is another related issue: K8s Integration does not report correct container.id when container restarts · Issue #5348 · elastic/integrations · GitHub

Both issues are v8.6, so I'm wondering if there is a bunch of regressions all related to the same thing that have been introduced, and we're all experiencing subtly different results.

woodywoodsta · February 23, 2023, 12:14pm

To add: one thing is for certain: I've been unable to find any output from the agents, beats or other elastic components which indicate what is going wrong - so it must be a silent failure somewhere.

Savva_Morozov · March 21, 2023, 9:10pm

Hello! I think we are experiencing the same problem. Elastic Agent 8.6.2 in Fleet mode using Kubernetes integration is not collecting some logs from pods or sometimes not collecting logs from specific pods at all, even after changes in configuration and agent restarts. The logs of Elastic agents didn't show anything special that could explain this strange behaviour. Standard Filebeat seems to collect all the logs with no problems.

woodywoodsta · April 10, 2023, 5:20pm

This appears to be resolved in v8.7.0.

system · May 8, 2023, 5:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elastic Agent conditions-based autodiscover doesn't pick up newly-scheduled pods/containers Elastic Agent	6	533	March 15, 2023
Elastic-Agent autodiscovery for Kubernetes pod logs not working after upgrading to 8.6.x Elastic Agent filebeat	2	526	March 16, 2023
Elastic Agent kubernetes container logs not shipping from nodes Elastic Agent filebeat	1	364	May 22, 2023
Elastic Agent Kubernetes integration doesn't send container logs Elastic Agent elastic-agent	1	1138	August 16, 2022
Logs not being harvested after agent restart Beats filebeat , elastic-agent	10	1611	March 1, 2022

Pod container logs stop randomly

Related topics