Search timeout doesn't work

In our testing, we are observing that the following timeout settings aren't honored. This is in Elasticsearch 6.8.23.

escurl -XPUT //_cluster/settings -d '{"persistent": {"search.default_search_timeout": "5m"}}' escurl -XPUT //_cluster/settings -d '{"persistent": {"search.low_level_cancellation": true}}'

Confirmed that these settings were persisted in our cluster environment.

Also noticed this prior thread with the same issue that wasn't resolved. Default search timeout doesn't work

Was this a bug at some point of time that was fixed? Any input appreciated.

Yes, it’s possible for the query to exceed the allocated timeout parameter. This behaviour explained in the documentation.

It’s important to know that the timeout is still a best-effort operation; it’s possible for the query to surpass the allotted timeout. There are two reasons for this behavior:

  1. Timeout checks are performed on a per-document basis. However, some query types have a significant amount of work that must be performed before documents are evaluated. This "setup" phase does not consult the timeout, and so very long setup times can cause the overall latency to shoot past the timeout.
  2. Because the time is once per document, a very long query can execute on a single document and it won’t timeout until the next document is evaluated. This also means poorly written scripts (e.g. ones with infinite loops) will be allowed to execute forever.

If you need a hard timeout in your application, you need to do it on the client side.

2 Likes

Thank you. With a timeout of only a few seconds, saw a query execute for several minutes. The query looks ok at first sight. Will keep checking. Appreciate the response.

If timeout can’t help, what’s the recommendation to save search threads from queries hogging them?

Context - Composite aggs running on large bucket spaces (100k+) are choking some data nodes, keeping all search threads occupied. Timeouts aren’t killing such queries. Requires a restart of the process today, which is disruptive for availability!

What @Musab_Dogan said: use a client-side timeout (or a proxy). The search timeout setting doesn't measure what you might expect.

See these docs on task cancellation. But also note that you may be encountering a bug if a task is shown as cancelled and yet continues to run. In that case, please do the troubleshooting described in the manual and share the results here so we can address it.

1 Like