Hi,
I'm noticing something peculiar when running our bulk indexing job against our Elasticsearch cluster (version 7.6.1).
Our client uses 48 cores in total to bulk index into Elasticsearch. Each request contains a batch of size 2000 documents (about 5-7 MB per batch).
The destination index consists of 20 shards evenly distributed across all our 8 nodes (each node contains 2-3 shards).
I noticed that most of the requests/tasks seem to be mainly getting queued on one node for some reason and I'm worried that this might be affecting the indexing performance.
During indexing, when running:
GET /_cat/thread_pool/write?v&h=node_name,name,queue,active,size,rejected,completed&s=node_name
I usually get something like this:
node_name name queue active size rejected completed
elastic-search-cluster-es-default-0 write 0 3 7 315652 10710410
elastic-search-cluster-es-default-1 write 0 2 7 163080 11730980
elastic-search-cluster-es-default-2 write 0 7 7 1372060 11197689
elastic-search-cluster-es-default-3 write 1 7 7 156475 11863509
elastic-search-cluster-es-default-4 write 0 5 7 201970 11951185
elastic-search-cluster-es-default-5 write 87 7 7 978110 10859521
elastic-search-cluster-es-default-6 write 0 3 7 6 3356353
elastic-search-cluster-es-default-7 write 0 3 7 8 3369501
As you can notice, most of the requests/tasks seem to be queued on the node elastic-search-cluster-es-default-5
(the difference is also quite significant). Please, note that all my nodes are eligible for all node roles.
Is this behavior normal/to be expected? Anything I can do/check to guarantee that the tasks/requests get queued more evenly/uniformly over all my 8 nodes?
Thank you in advance.