I have a problem with a node (different each time) that stops responding. When checking _cat/thread_pool in "hanging mode", the GET queue is full and have started rejected requests.
The log file, though, indicates that it is the bulk queue that is full:
"bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@46a91dcd[Running, pool size = 12, active threads = 12, queued tasks = 200, completed tasks = 13766271]"
There are no errors in the log file preceding these ones, but I found plenty of entries similar to this one (the last before the error)
[DATEandTIME][DEBUG][o.e.c.u.c.QueueResizingEsThreadPoolExecutor] [NODENAME/search]: there were [2000] tasks in [14.8s], avg task time [10.6ms], EWMA task execution [4ms], [93.99 tasks/s], optimal queue is [93], current capacity [1000]
/_cat/thread_pool
(#active, queue, rejected)
nodename bulk 0 0 0
nodename get 12 1000 28304
#removed other queues but they were all zeros
Q: Why this discrepancy? Is there somewhere else I can look? And why no mentioning in the log file (level debug) about the GET thread pool rejecting?
Using 6.2.4 and Yes, I am aware that this is fixed in later releases but I still want to understand and to be able to point somewhere and say, "Yes, this makes sense"
Have 5 nodes, 5 primaries and 1 replica for each primary, heap size 26GB and more than twice as much in RAM.
Top lines from error message:
[DATEandTIME][DEBUG][r.suppressed ] path: /index-listresults/worklistresults/a49670d4-1c31-ec11-8f2f-005056a48c06, params: {index=index-listresults, id=a49670d4-1c31-ec11-8f2f-005056a48c06, type=listresults}
org.elasticsearch.transport.RemoteTransportException: [NODENAME][IP:9300][indices:data/write/bulk[s][p]]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TransportService$7@21b7ccd0 on EsThreadPoolExecutor[name = NODENAME/bulk, queue capacity = 200, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@46a91dcd[Running, pool size = 12, active threads = 12, queued tasks = 200, completed tasks = 13766271]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.2.4.jar:6.2.4]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_265]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_265]
Thank you in advance.