Hello guys.
I have an elastic apm implemented in my environment. When I try to get metrics on service with time range, like one week, after waiting time, I got a 502 error on kibana.
This error don't happen in time range like 15 minutes or 1 hour.
I'm trying to understand why or where this error is coming from. I have a storage without performance, like low IOPS. I believe this may be part of the problem.
I'm running all stack over containers in one server and I have a traefik with my load balancer in another server.
I was some changes to try to fix this error like:
- Traefik with 24 hours of timeout
- Search Timeout on Kibana with 15000000
Even with these changes, error 502 still occurs. It's strange to me, it's this behavior isn't the same. I'm mean, I got this error after 4 minutes, but I got success about 15 minutes waiting.
I saw in all logs and all resources and I don't had any errors like OOM, but I constantly get iowait when I try to get data with long time range.
You can see in images below:
That's my log into traefik:
1.2.3.4 - - [21/Mar/2024:15:03:30 +0000] "GET /internal/apm/services/api-edited/dependencies?start=2024-02-21T15%3A03%3A30.186Z&end=2024-03-21T15%3A03%3A30.186Z&environment=ENVIRONMENT_ALL&numBuckets=20&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045555 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240004ms
1.2.3.4 - - [21/Mar/2024:15:03:32 +0000] "GET /internal/apm/services/api-edited/transactions/charts/latency?environment=ENVIRONMENT_ALL&kuery=&start=2024-02-21T15%3A03%3A30.153Z&end=2024-03-21T15%3A03%3A30.153Z&transactionType=request&latencyAggregationType=avg&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045782 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240007ms
1.2.3.4 - - [21/Mar/2024:15:03:32 +0000] "GET /internal/apm/services/api-edited/service_overview_instances/main_statistics?environment=ENVIRONMENT_ALL&kuery=&latencyAggregationType=avg&start=2024-02-21T15%3A03%3A30.191Z&end=2024-03-21T15%3A03%3A30.192Z&transactionType=request&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045785 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240015ms
1.2.3.4 - - [21/Mar/2024:15:03:32 +0000] "GET /internal/apm/services/api-edited/transactions/charts/error_rate?environment=ENVIRONMENT_ALL&kuery=&start=2024-02-21T15%3A03%3A30.172Z&end=2024-03-21T15%3A03%3A30.172Z&transactionType=request&offset=2505600000ms HTTP/2.0" 502 11 "-" "-" 471045786 "kibana-https-apm-edited-master@docker" "http://10.0.9.20:5601" 240016ms
How can I do to get all data without 502 error, running this stack on non-performative environment? What configuration can I change to have more time to wait without 502 error?
About elasticsearch, what do you think about running elasticsearch with more data nodes? Can I get more performance running with more data nodes (obviously, segmenting shards)?
Thank you.