Add APM agent health check

There is a possibility to lose the tracking data because of the misconfiguration on the application side or changes on ELK setup or network issue or ...
I wondered if there is any way to get connection status from the APM agent and put it as a part of the health check indicators to be informed about connectivity issues.
Health check response by this integration can be something like below:

{
	"status": "Healthy",
	"results": {
		"DB": {
			"status": "Healthy",
			"description": null,
			"data": {}
		},
		"ELASTIC-APM": {
			"status": "Healthy",
			"description": "APM agent is connected.",
			"data": {}
		}
	}
}

Have you considered manually doing a simple HTTP health check for the APM Server URL?

Could you please elaborate a bit more? What do you mean by a simple manual HTTP health check for the APM server URL? Do you mean calling the APM server health endpoint through the main application (Server Information API | APM Server Reference [7.15] | Elastic)? This endpoint is not available on our setup.

Yes, that's what I meant.

Why is this endpoint not available? What would you want the health check to indicate then?

Hi @felixbarny

How I understand @BehroozBahrameh,
He wants see available of agents in Kibana interface.

With current data model - we can see any data related to agents if transaction, span, error, or metrics are sent by agent.
But there may be situations, for example, when there are no transactions in the applications and the sending of metrics is disabled.

And agents will not send any data.

This introduces users into a misunderstandings: there are some network partitions or there are no transactions at all.

The solution is to send a new type of event: healthcheck. The data that will be stored, for example, in the apm-healthcheck-xxx index.

As a workaround, users can create gauges like this:


custom gauge configuration:

It was my bad. We have two different ports, 8200 and 8201. I used 8201 for APM, and it doesn't have the health check endpoint, but 8200 has. So I've checked, and I can use 8200 for APM as well, so I will probably go for your approach and use Agent.Config.ServerUrl for the health checking.
Thank you for your hint

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.