Bulk indexing requests are mostly queued on one node in the cluster


I'm noticing something peculiar when running our bulk indexing job against our Elasticsearch cluster (version 7.6.1).

Our client uses 48 cores in total to bulk index into Elasticsearch. Each request contains a batch of size 2000 documents (about 5-7 MB per batch).

The destination index consists of 20 shards evenly distributed across all our 8 nodes (each node contains 2-3 shards).

I noticed that most of the requests/tasks seem to be mainly getting queued on one node for some reason and I'm worried that this might be affecting the indexing performance.

During indexing, when running:

GET /_cat/thread_pool/write?v&h=node_name,name,queue,active,size,rejected,completed&s=node_name

I usually get something like this:

node_name                           name  queue active size rejected completed
elastic-search-cluster-es-default-0 write     0      3    7   315652  10710410
elastic-search-cluster-es-default-1 write     0      2    7   163080  11730980
elastic-search-cluster-es-default-2 write     0      7    7  1372060  11197689
elastic-search-cluster-es-default-3 write     1      7    7   156475  11863509
elastic-search-cluster-es-default-4 write     0      5    7   201970  11951185
elastic-search-cluster-es-default-5 write    87      7    7   978110  10859521
elastic-search-cluster-es-default-6 write     0      3    7        6   3356353
elastic-search-cluster-es-default-7 write     0      3    7        8   3369501

As you can notice, most of the requests/tasks seem to be queued on the node elastic-search-cluster-es-default-5 (the difference is also quite significant). Please, note that all my nodes are eligible for all node roles.

Is this behavior normal/to be expected? Anything I can do/check to guarantee that the tasks/requests get queued more evenly/uniformly over all my 8 nodes?

Thank you in advance.

Is your client talking to all nodes in the cluster? What is your client?

@warkolm My client is a Databricks cluster. We are using Elasticsearch for Kubernetes and we're communicating with our Elasticsearch cluster using the designated Elasticsearch Kubernetes service which then load balances the requests to the different Elasticsearch nodes (we are using raw REST requests because of some restrictions we have so we can't use the Elasticsearch Python client). I also noticed that the indexing load is much better balanced when indexing different data on an index with a different schema.

Could this behavior be caused by something related to the nature of the data or a mistake in our schema somehow?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.