I have two clusters (ES 7.3.1) that are hosted by an internal team here at my org. Each cluster has 6 data nodes, three master nodes, and one cluster has two client nodes that perform searches against both clusters simultaneously.
We hitting a 120s timeout consistently within the cluster. We have increased every timeout in ES that we can find, as we want queries to run 10 minutes when necessary. The team hosting these clusters is now saying that the two-minute timeout is due to a setting (unexposed prior to 7.4) named
server.keepalivetimeout. Does this sound correct? I'm reading online of other users having queries much longer than 2 minutes without issue..
I'm asking because I don't want to wait for 7.4 to land, bump to that, and still hit the 120s timeout issue due as it wasn't an ES setting at all, but rather an unforeseen proxy issue.