I recently came across this post Elasticsearch memory-bound tasks. Basis which I am trying to understand the nature of a search request to my cluster.
Following are some questions I have
What are the signals I should look at to determine if my search request is I/O bound, CPU bound?
I am aware of the explain and profile API for search request. But I'm not sure which fields will help me determine the nature of my search request. Any pointer on the fields/API that I should pay attention to would be really helpful.
Thanks much. Please let me know if you need more info. from my side.
Assuming there are no APIs that'd give me a breakdown of time spent on different stages of search, if I observe the field query_time_in_millis in node stats to be significantly higher than fetch_time_in_millis, is it fair to say my search requests are more CPU bound than disk bound?
many ways to debug where your queries may be spending most of their time.
GET /_nodes/hot_threads - This API provides a snapshot of the busiest threads in the system. If your query is causing a lot of CPU usage, these threads can give you a hint.
GET /_nodes/stats/fs - Check for io_stats in the response which will give you information about disk operations.
`GET /_nodes/stats/os' - Look for the os.cpu.percent metric in the response.
use devtool query profiler or GET /index_name/_search?explain=true - While this doesn't directly tell you if a query is CPU or disk bound, the _explain endpoint provides detailed information about how a query is executed.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.