Why timeout of agent, load balanced and apm server should be incremental?

This is purely an educational question. We hit the well known I/O timeout problem due to the following setup of timeouts:

Agent | Load Balancer | APM server
10s   | 60s           | 30s

After changing to 10s – 15s – 30s everything works as expected.

I tried to figure out why it doesn't work in the first place, but neither my limited networking knowledge, neither googling and source reading helped me. I would appreciate it if you could explain why this is happening?

I'm actually not on the server team but this piqued my interest and has gone unanswered, so I investigated a little.

Basically, the ELB timeout is how long it tries to keep open the connection for re-use. If the server "hangs up" first, though, then when that connection tries to be re-used, you'll get an error. I'm pretty sure it's that simple, though I'm not an expert in this area.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.