Yes, it’s possible for the query to exceed the allocated timeout parameter. This behaviour explained in the documentation.
It’s important to know that the timeout is still a best-effort operation; it’s possible for the query to surpass the allotted timeout. There are two reasons for this behavior:
Timeout checks are performed on a per-document basis. However, some query types have a significant amount of work that must be performed before documents are evaluated. This "setup" phase does not consult the timeout, and so very long setup times can cause the overall latency to shoot past the timeout.
Because the time is once per document, a very long query can execute on a single document and it won’t timeout until the next document is evaluated. This also means poorly written scripts (e.g. ones with infinite loops) will be allowed to execute forever.
If you need a hard timeout in your application, you need to do it on the client side.
Thank you. With a timeout of only a few seconds, saw a query execute for several minutes. The query looks ok at first sight. Will keep checking. Appreciate the response.
If timeout can’t help, what’s the recommendation to save search threads from queries hogging them?
Context - Composite aggs running on large bucket spaces (100k+) are choking some data nodes, keeping all search threads occupied. Timeouts aren’t killing such queries. Requires a restart of the process today, which is disruptive for availability!
What @Musab_Dogan said: use a client-side timeout (or a proxy). The search timeout setting doesn't measure what you might expect.
See these docs on task cancellation. But also note that you may be encountering a bug if a task is shown as cancelled and yet continues to run. In that case, please do the troubleshooting described in the manual and share the results here so we can address it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.