So we have a 16 node cluster, indexes are spread across 16 shards. Up until last week it seemed to be correctly distributing across all the nodes/shards. The last couple days however it seems like a single node will take turns with most of the bulk indexing threads and the other nodes will have just a few.
running this:
curl -s "localhost:9200/_cat/thread_pool?v&h=host,bulk.max,bulk.size,bulk.active,bulk.queueSize,bulk.queue,bulk.rejected"
with a little bit of bash-fu yields:
host bulk.max bulk.size bulk.active bulk.queueSize bulk.queue bulk.rejected
es1-prod.aerserv.com 100 100 5 500 0 0
es2-prod.aerserv.com 100 100 7 500 0 0
es3-prod.aerserv.com 100 100 6 500 0 0
es4-prod.aerserv.com 100 100 18 500 0 0
es5-prod.aerserv.com 100 100 4 500 0 0
es6-prod.aerserv.com 100 100 100 500 43 0
es7-prod.aerserv.com 100 100 6 500 0 0
es8-prod.aerserv.com 100 100 7 500 0 0
es9-prod.aerserv.com 100 100 6 500 0 0
es10-prod.aerserv.com 100 100 5 500 0 0
es11-prod.aerserv.com 100 100 7 500 0 0
es12-prod.aerserv.com 100 100 6 500 0 0
es13-prod.aerserv.com 100 100 5 500 0 0
es14-prod.aerserv.com 100 100 6 500 0 0
es15-prod.aerserv.com 100 100 21 500 0 0
es16-prod.aerserv.com 100 100 7 500 0 0
see es6-prod as it's stuck at 100 bulk.active. all the other nodes are up and down but none get that high.
any ideas would be appriciated!