Network Disruption on Kubernetes Node with Elastic Security Integration on Debian

Lolozini · February 27, 2024, 9:14am

Hello,

I am experiencing a issue where the private Kubernetes network fails on a node after setting up the Elastic Security integration with the Elastic Agent directly on a Debian server, which runs in parallel with another Elastic Agent operating within my k8s cluster. The issue seems to arise specifically when the Elastic Agent is installed directly on the Debian server, affecting the network functionality on the corresponding k8s node.

Environment Details:

Elastic Stack Version: 8.12.1
Operating System: Debian 12
Kernel Version: 6.1.0-18
Kubernetes Version: 1.27.8
Cilium Version: 1.14.5
Deployment: Kubernetes with Fleet Server for agent management
Observation: One agent is running within the Kubernetes cluster with only the Kubernetes integration and functions without issues. The problem occurs when another Elastic Agent with the Elastic Security integration is installed on the Debian server.

Symptom:

After installing the Elastic Agent with Elastic Security on the Debian server, the k8s private network on the affected node stops functioning correctly. The temporary workaround to restore network functionality is to restart the Cilium pod, but this fix is only temporary, as the network issues reoccur within minutes to hours.

Steps to Reproduce:

Ensure an Elastic Agent with Kubernetes integration is running within a k8s cluster.
Install another Elastic Agent with Elastic Security integration on a Debian server, ensuring to change the gRPC port to avoid conflicts with the Kubernetes agent.
Observe the disruption in the k8s private network functionality on the node associated with the Debian server.

Actual Behavior: The k8s private network on the node associated with the Debian server fails, causing significant operational issues. The only temporary remedy found is to restart the Cilium pod, which only provides a short-term solution as the network issues recur after some time.

Additional Context:

The issue does not manifest in the Kubernetes agent running the Kubernetes integration.
This problem seems to be specifically triggered by the Elastic Security integration's, the other integrations are trouble-free and work properly
Unfortunately, I don't have any interesting logs to share, because despite the private network no longer working, the integration is working properly.

I am seeking assistance to resolve this network disruption issue, which seems to be tied to the specific setup of Elastic Agent with Elastic Security on a Debian server parallel to a Kubernetes environment. Any insights, suggestions, or solutions to prevent the k8s network from failing would be greatly appreciated.

Thank you.

ferullo · February 27, 2024, 11:12pm

Thanks for all those details!

It might be an interaction between Endpoint and Cillium that's causing this. Can you try disabling Endpoint's support for host isolation? Normally on systems where host isolation is possible, Endpoint loads some ebpf probes on start up to be ready to isolate the host.

To disable that, go to the appropriate Elastic Defend policy, click "Show advanced settings" at the bottom of the page, then set linux.advanced.host_isolation.allowed to false. While that setting should take effect as soon as the policy is saved and applied to the host, you might need to remove the Elastic Defend integration, verify networking works again, then re-add the integration and see if networking now continues to work.

I hope that fixes this issue. If it does the only thing that will be disabled is host isolation. Protections and network events will still work.

Lolozini · March 4, 2024, 9:15am

Thanks for the answer, it works perfectly

system · April 1, 2024, 9:15am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Installing elastic agent using K8S is not normal Elastic Security	10	549	August 10, 2023
Elastic Agent with ECK deployed stack - Fleet Manager problems Elastic Cloud on Kubernetes (ECK)	7	2069	November 4, 2022
Elastic Endpoint Security missing host Endpoint Security	21	3777	November 4, 2020
Installing Elastic Agent on Kubernetes doesn't come up as healthy - managed by Elastic Fleet Elastic Agent	2	465	April 26, 2023
Elastic Endpoint Security with Elastic Agent Endpoint Security	16	3221	November 10, 2020

Network Disruption on Kubernetes Node with Elastic Security Integration on Debian

Related topics