I have a 5 node cluster with 5 primary and 1 replica shard. I have a multi-process system for inserting documents into my index. Each instance of the inserter randomly selects a node from the ES cluster to send bulk insert requests to in order to balance the load between all nodes in the ES cluster. However, despite that it seems like one ES node seems to get stuck with the majority of the work - when I look at _cat/thread_pool/bulk one of the nodes has all its bulk threads active with a large backload of requests in the queue and the other four nodes only have 2 or 3 active bulk threads and no backlog.
Why would this happen?