120s Timeout on Elasticsearch Queries

I have two clusters (ES 7.3.1) that are hosted by an internal team here at my org. Each cluster has 6 data nodes, three master nodes, and one cluster has two client nodes that perform searches against both clusters simultaneously.

We hitting a 120s timeout consistently within the cluster. We have increased every timeout in ES that we can find, as we want queries to run 10 minutes when necessary. The team hosting these clusters is now saying that the two-minute timeout is due to a setting (unexposed prior to 7.4) named server.keepalivetimeout. Does this sound correct? I'm reading online of other users having queries much longer than 2 minutes without issue..

I'm asking because I don't want to wait for 7.4 to land, bump to that, and still hit the 120s timeout issue due as it wasn't an ES setting at all, but rather an unforeseen proxy issue.

I don't recognise this setting and cannot see any evidence that Elasticsearch does either. Does it belong to a third-party plugin?

Elasticsearch supports queries that take much longer than 120 seconds by default, but your HTTP client (or an intermediary proxy) may not and may be closing the connection prematurely.

Thanks David, it appears that server.keepAliveTimeout is being introduced in as a Kibana setting in 7.4.
I probably should have posted this in Kibana rather than Elasticsearch - please let me know if you find that this timeout is something we need to address in 7.3.1, however I'm also of the opinion at this point that it's likely an intermediary service causing this timeout.

@DavidTurner The team that owns this cluster is pointing to a Kibana timeout they say they can't modify until 7.4. Does that sound correct? I believe I've ran >2 minute long queries on previous Elastic stacks without issue..
Is there someone on your side that's deeply familiar with Kibana that could confirm/deny this as a cause of the timeout we're encountering. Kibana is masking it as a 502 bad gateway, which further makes me believe it's not actually a Kibana timeout.
We're using the bundled OpenJDK 12 if that matters..

I see, yes, it looks like this might be something that has changed in in Kibana recently. I think you would be best off asking in the Kibana forum since I'm not too familiar with this change, nor Kibana's timeout behaviour in general. I'm pretty sure that Elasticsearch is not the source of a 502 Bad Gateway response.

Great. Thanks for your help. It turns out there is a 2 minute query timeout hardcoded into Kibana pre-7.4 where it is a variabilized config timeout.

Link for anyone else that may come across this issue in the future.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.