HTTP 502 into APM service metrics

Hello guys.

I have an elastic apm implemented in my environment. When I try to get metrics on service with time range, like one week, after waiting time, I got a 502 error on kibana.

This error don't happen in time range like 15 minutes or 1 hour.

I'm trying to understand why or where this error is coming from. I have a storage without performance, like low IOPS. I believe this may be part of the problem.

I'm running all stack over containers in one server and I have a traefik with my load balancer in another server.

I was some changes to try to fix this error like:

  • Traefik with 24 hours of timeout
  • Search Timeout on Kibana with 15000000

Even with these changes, error 502 still occurs. It's strange to me, it's this behavior isn't the same. I'm mean, I got this error after 4 minutes, but I got success about 15 minutes waiting.

I saw in all logs and all resources and I don't had any errors like OOM, but I constantly get iowait when I try to get data with long time range.

You can see in images below:

That's my log into traefik:

1.2.3.4 - - [21/Mar/2024:15:03:30 +0000] "GET /internal/apm/services/api-edited/dependencies?start=2024-02-21T15%3A03%3A30.186Z&end=2024-03-21T15%3A03%3A30.186Z&environment=ENVIRONMENT_ALL&numBuckets=20&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045555 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240004ms
1.2.3.4 - - [21/Mar/2024:15:03:32 +0000] "GET /internal/apm/services/api-edited/transactions/charts/latency?environment=ENVIRONMENT_ALL&kuery=&start=2024-02-21T15%3A03%3A30.153Z&end=2024-03-21T15%3A03%3A30.153Z&transactionType=request&latencyAggregationType=avg&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045782 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240007ms
1.2.3.4 - - [21/Mar/2024:15:03:32 +0000] "GET /internal/apm/services/api-edited/service_overview_instances/main_statistics?environment=ENVIRONMENT_ALL&kuery=&latencyAggregationType=avg&start=2024-02-21T15%3A03%3A30.191Z&end=2024-03-21T15%3A03%3A30.192Z&transactionType=request&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045785 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240015ms
1.2.3.4 - - [21/Mar/2024:15:03:32 +0000] "GET /internal/apm/services/api-edited/transactions/charts/error_rate?environment=ENVIRONMENT_ALL&kuery=&start=2024-02-21T15%3A03%3A30.172Z&end=2024-03-21T15%3A03%3A30.172Z&transactionType=request&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045786 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240016ms

How can I do to get all data without 502 error, running this stack on non-performative environment? What configuration can I change to have more time to wait without 502 error?

About elasticsearch, what do you think about running elasticsearch with more data nodes? Can I get more performance running with more data nodes (obviously, segmenting shards)?

Thank you.