Uneven search requests distribution among nodes

Hi,
I have a single shard cluster, with 5 replica (total 6 nodes).
Each having 25 threads on their search thread pool.
I am running a stress test by sending same query to the cluster 20 requests per second.
When I monitor the search thread usage on each node, I notice that they are very unevenly distributed. Couple of the nodes are utilizing full 25 thread from pool and started to build up queue, while other threads still have open capacity to serve the search request. (attached is the thread pool stats).
Also I have attached the hot threads output for the node that has built up the queue.
I am on AWS ES, using version 6.4.
I tried to set "cluster.routing.use_adaptive_replica_selection" to "true", but I believe that AWS does not allow to modify this setting, and it is turned OFF by default in version 6.x.

How else can I get the equal distribution of search requests among the available nodes?
Is there anything missing?
search thread pool activity
</>

node# name active queue rejected
1 search 7 0 0
2 search 25 8 0
3 search 9 0 0
4 search 19 0 0
5 search 9 0 0
6 search 21 0 0

</>

hot threads output from node # 2
</>
::: {x.x.x.x}{x.x.x.x:9300}{zone=us-east-1d, distributed_snapshot_deletion_enabled=true}
Hot threads at 2020-02-13T14:21:43.527, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

64.5% (322.6ms out of 500ms) cpu usage by thread 'elasticsearch[KVIahSv][search][T#23]'
4/10 snapshots sharing following 25 elements
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:422)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readList(AbstractXContentParser.java:407)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readValue(AbstractXContentParser.java:424)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readMap(AbstractXContentParser.java:364)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.readOrderedMap(AbstractXContentParser.java:331)
org.elasticsearch.common.xcontent.support.AbstractXContentParser.mapOrdered(AbstractXContentParser.java:287)
org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:142)
org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:112)
org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:89)
org.elasticsearch.search.fetch.FetchPhase.createNestedSearchHit(FetchPhase.java:288)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:147)
org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitsExecute(InnerHitsFetchSubPhase.java:69)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
6/10 snapshots sharing following 14 elements
org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitsExecute(InnerHitsFetchSubPhase.java:69)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

61.4% (307.1ms out of 500ms) cpu usage by thread 'elasticsearch[KVIahSv][search][T#9]'
4/10 snapshots sharing following 15 elements
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:147)
org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitsExecute(InnerHitsFetchSubPhase.java:69)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
2/10 snapshots sharing following 30 elements
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
org.elasticsearch.common.util.concurrent.ReleasableLock.acquire(ReleasableLock.java:55)
org.elasticsearch.common.cache.Cache.promote(Cache.java:730)
org.elasticsearch.common.cache.Cache.get(Cache.java:369)
org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:391)
org.elasticsearch.index.cache.bitset.BitsetFilterCache.getAndLoadIfNotPresent(BitsetFilterCache.java:128)
org.elasticsearch.index.cache.bitset.BitsetFilterCache.access$000(BitsetFilterCache.java:73)
org.elasticsearch.index.cache.bitset.BitsetFilterCache$QueryWrapperBitSetProducer.getBitSet(BitsetFilterCache.java:191)
org.elasticsearch.search.fetch.FetchPhase.findRootDocumentIfNested(FetchPhase.java:181)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:145)
org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitsExecute(InnerHitsFetchSubPhase.java:69)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
4/10 snapshots sharing following 14 elements
org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitsExecute(InnerHitsFetchSubPhase.java:69)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)

57.2% (285.9ms out of 500ms) cpu usage by thread 'elasticsearch[KVIahSv][search][T#4]'
7/10 snapshots sharing following 18 elements
org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:112)
org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:89)
org.elasticsearch.search.fetch.FetchPhase.createNestedSearchHit(FetchPhase.java:288)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:147)
org.elasticsearch.search.fetch.subphase.InnerHitsFetchSubPhase.hitsExecute(InnerHitsFetchSubPhase.java:69)
org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:165)
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:393)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:368)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:333)
org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:329)
org.elasticsearch.search.SearchService$3.doRun(SearchService.java:1019)

</>

Am I asking this question in wrong forum? If so, can someone please point me to the right forum for this type of questions?

It's possible this isn't the right forum indeed. This sounds like the kind of problem that adaptive replica selection is designed to solve, but I don't think there are any AWS folk here to comment on how to get that working on AWS Elasticsearch.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.