We are encountering errors in our current deployment involving Fleet Server and Fleet Agent components. The specific errors we are facing are as follows:
Fleet Server Error: Error Message: "Non-zero metrics in the last 30s"
Fleet Agent Error: Error Message: "Cannot check in with fleet-server, retrying"
elastic-agent status
┌─ fleet
│ └─ status: (FAILED) status code: 0, fleet-server returned an error: , message: The upstream server is timing out
└─ elastic-agent
└─ status: (HEALTHY) Running
Environment:
Fleet Server is deployed within our “infrastructure” cluster. This cluster includes Elasticsearch and Kibana components, which are functioning correctly.
Fleet Agent is deployed in one of our Kubernetes “playground” clusters. The purpose of this agent is to collect Kubernetes logs and other observability-related data.
In Kibana the agent is unhealthy/offline (status is flapping from healthy to offline and sometimes back) while the fleet is healthy and online all the time. Interestingly enough, even though the Fleet Agents are periodically marked as offline, when we have a look at the agent metrics, these seem to be still collecting.
Additional Information: We need assistance in identifying and resolving these errors to ensure the proper functioning of our deployment. Any guidance or support in addressing these issues would be greatly appreciated. Thank you for your assistance.
fleet:
access_api_key: Y1pHOHgtdw==
agent:
id: ac177c50-da37-490b-9ed8-a755be756174
enabled: true
host: localhost:5601
hosts:
- https://fleet-server.xyz.com:443
protocol: http
ssl:
renegotiation: never
verification_mode: full
timeout: 10m0s