Hello.
Posting again after a long hiatus because I recently came across a behavior that I didn't think was possible and I'd like some of the elastic.co community to validate my thoughts.
ES version 2.0 prior to upgrade to 2.4.4 (yes, I know...)
OS: CentOS 6.9 (yes, I know...)
70 node cluster in AWS, r3.2xl (yes, I know...)
Situation is that I have a variety of metrics via DataDog. One of those shows search active which maxed out at 7 threads, as expected, under load. The search queue size never grew, not even spiking above 0, while the reject count hits 800. All of these occur at the same time. This was a momentary spike and in our environment, we can accept the rejections and clients can (will) retry. But if my understanding is right, the rejections should not have occurred.
So what could possibly happen that the search queue never grows, never accepts any queries, when it's sized at 1000 entries? To go from active threads maxed out to search rejections when we should have queued 1000 queries first means I'm either missing something, something is broken, or my mental model is wrong. Which is it?
Cheers!