Hi everybody,
In case it might be related, the agents were enrolled according to the procedure described in the following topic https://discuss.elastic.co/t/agent-stuck-on-updating-when-enrolling/277703 as enroll process failed otherwise.
For some reason, even though the agents show up as healthy in Kibana, we don' t receive any of their data after some time. These same agents work perfectly for a few days before they stop sending data.
From the logs we extracted the following lines:
{"log.level":"error","@timestamp":"2021-07-01T21:04:03.411+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: status code: 503, fleet-server returned an error: ServiceUnavailable, message: server is stopping","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-01T21:26:46.605+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: status code: 503, fleet-server returned an error: ServiceUnavailable, message: server is stopping","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-01T21:39:46.169+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: could not decode the response, raw response: ","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-02T00:23:42.924+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: could not decode the response, raw response: ","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-02T03:29:30.201+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: could not decode the response, raw response: ","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-02T07:08:32.961+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: could not decode the response, raw response: ","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-02T08:13:37.392+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: could not decode the response, raw response: ","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2021-07-02T21:17:42.031+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":205},"message":"Could not communicate with fleet-server Checking API will retry, error: status code: 503, fleet-server returned an error: ServiceUnavailable, message: server is stopping","ecs.version":"1.6.0"}
At least two agents have failed at approximately the same time. However another 4 kept going. They are all synced with the same fleet-server which has never stopped.
Unfortunately even after restarting the agent nothing happens. In this case we must uninstall it and re-enroll it in fleet.
What could be the source of this problem ?
Thanks for your help.