Elastic 7.13.3 update to 7.13.4 --- Ouch that was an interesting bug

So simple version to anyone that runs a custom CA for Fleet/Endpoint integration.

Using fleet to manage your agents will delete the CA certificate if you put it in the Agent folder when you deploy it initially. The agent should be pulling from the local machine certificate store for a list of trusted CA's but fails to do so resulting in having to add the CA externally. 90% of the failures that show up are CA related when streaming data from agent to Elastic.

It's also something interesting to note. This has an unintended side effect. 1 in every 25 machines seems to suffer from. When the agent restarts you loss the data as expected as its' no longer able to connect to ES. It will attempt to establish network connections endlessly resulting in thousands of network sessions being spammed 17,415 from one server. Add the CA file back and restart the agent all happy. During this time no logs are sent to ES so your scratching your head as to what's going on.

1 Like

HI @PublicName Thanks for all the investigation and sharing this. This does not sound great. With 7.14 out we should investigate if this issue has been resolved or still exists. Any chance you could open a bug report in this repo here? GitHub - elastic/beats: Beats - Lightweight shippers for Elasticsearch & Logstash If yes, please link it here so I can directly follow up on it there.

Attempting to narrow down a root cause then I'll submit the bug. Should be shortly.

Confirmed CA removal on 7.14 as well it seems to happen when known working upgrades fail the first time and stay at the existing version. Attempting to rerun looks to remove it.

Confirmed on the network spam for 7.14 GA as well when CA is missing on some.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.