indices
: total indices number is 75
-> each index has only 1 shard and about 600,000 docs (2~3GB)
replica number for each index is 5 (i.e., all nodes have primary or replica shard for each index)
We have done search traffic test with the below scnearios without any indexing job on the above env.
(search request type is dfs_query_then_fetch, and disbaled query cache and requests cache)
Case 1. Send search requests to only 25 indices -> search latency : 40ms
Case 2. Send search requests to all 75 indices -> search latency : 190ms
Amount of search traffic is same, just the number of target indices for search requests is different.
But there is much difference of search latency between case 1 and 2.
We have found disk IOPS has increased much more for Case 2 and we guess it causes overhead of search latency.
Could we know why disk IOPS has increased compared with Case 1?
For each index queried data structures on disk need to be accessed and matched documents retrieved. Unless all files are cached in the operating system page cache, which is not the case in your scenario, querying more indices is likely to led to increased disk I/O as more files need to be read than fit into the page cache.
Given that your shard size is small it may make sense to try reducing the number of indices and then try experimenting with the number of replicas held. If you have fewer replicas, the total data volume held shrinks, which means more of the data can be cached. If you are not indexing new data you can also try forcemerging the indices down to a single segment to imporove serach performance further.
It may also be worthwhile monitoring your heap usage and potentially try to reduce the heap size if you have room to spare. This will leave more room for the operating system cache, which can reduce disk I/O.
As I see, every search request makes search context and use IndexReader to open/read segment files for query/fetch steps.
Is the data of segment files also stored into page cache area?
And your solution seems be very good (reduce the heap size).
We would try to reduce the current heap size or upgrade system(e.g., 32GB->64GB RAM).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.