we have a small cluster with 6 machines and heterogeneous count of cores on each machine. 1x16, 3x24 and 2x48 cores inclusive hyperthreading are available (194 in sum). Further, on some of the machines other jobs will be calculated from time to time, meaning that ES don't get 100% power of the machine during these times.
On node side, we configured as command line parameter -E processors=
nproc -E thread_pool.bulk.size=
nproc (nproc=>cpu count) to avoid the ES 32 cores limit on the big machines. Further, we set the thread_pool.bulk.queue_size to 300 which is ~1.5xthe number of cores in the cluster, which could hold all incoming concurrent bulk requests we configured in theory.
Client: We use the java transport client with bulk indexing. 1.5MB Bulksize (which results in ~20-35 docs per bulk). One document is hard for ES to process (several seconds), since we have a big geo_shape field with a linestring with up to 2000 GPS points on each to index). We configured 291 (1.5xthe number of cores) concurrent requests. The idea behind is that each machine has, in average, all cores running to process the bulk requests, and half of the core size in the queue, which is running fine on a single shard index scenario.
We tried this with all machines connected to the client, sniffing both on and of, and also with a single node connected, hoping that this node does the load balancing job. No big differences.
Now we have the situation that the bulk queues on the machines with less cores becomes very full (over 250 items), whereby the queues on the big machines are empty, and the machines are bored most of the time. It looks that all requests are just distributed equally to all the nodes (round robbin), independent whether a node can manage it or not. Since the insert threads on client side are waiting for their bulks to be finished, they are waiting for the slow machine queues, before they give a new item to a fast machine. If we would give more threads, the queue size of the slow machines would be exceeded, leading to an TransportException.
In our scenario, the load balancing solution could look simple: just give the next bulk to the node with the smallest current queue size. Would this be possible somehow, e.g. in the java client? Also giving to the machine with the smallest load would be fine.
Nevertheless we are wondering that we fall into this, because load balancing should be more or less resolved in ES, isn't it? And having heterogeneous machines in a cluster should be common also. So there should be a solution for this we just are not aware of.
thanks for reading - any solution?