Why timeout of agent, load balanced and apm server should be incremental?

Evgeni_Dzhelyov · November 2, 2019, 4:50pm

This is purely an educational question. We hit the well known I/O timeout problem due to the following setup of timeouts:

Agent | Load Balancer | APM server
10s   | 60s           | 30s

After changing to 10s – 15s – 30s everything works as expected.

I tried to figure out why it doesn't work in the first place, but neither my limited networking knowledge, neither googling and source reading helped me. I would appreciate it if you could explain why this is happening?

basepi · November 11, 2019, 11:07pm

I'm actually not on the server team but this piqued my interest and has gone unanswered, so I investigated a little.

Basically, the ELB timeout is how long it tries to keep open the connection for re-use. If the server "hangs up" first, though, then when that connection tries to be re-used, you'll get an error. I'm pretty sure it's that simple, though I'm not an expert in this area.

system · December 2, 2019, 7:07pm

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Server fail over, suggested method APM	2	641	January 5, 2020
Heavy CPU usage in APM Agents when APM-servers goes down APM	7	1070	February 5, 2019
PHP APM Agent Blocking APM php	2	794	April 12, 2021
Peridiocally Java APM Agent experiences errors with connection to APM server APM java	7	789	August 10, 2023
APM Servers respond with 503 "request timed out" Elastic Observability	6	988	June 28, 2024

Why timeout of agent, load balanced and apm server should be incremental?

Related topics