Load balancing issues on cluster while indexing


(Glyton Camilleri) #1

Hi,

I am trying to index data on a 20 node cluster using the Bulk API through
the Java client in my application. After running my application for some
time, BigDesk shows that the load is taken mostly by a single node on the
cluster. I tried indexing my data in the two following ways:

  1. Added all the hosts and ports to the client using the addTransportAddresses
    on the TransportClient instance (this should distribute the requests in a
    RoundRobin fashion)
  2. Setting the master's node.data configuration setting to false and
    sending all bulk requests to the master node.

In both cases, i could see that the number of IndexRequests reported by
BigDesk is extremely high on one node, and 0 (or close to 0) on the
remainder of the cluster. Note that my client, does not use sniffing that
the cluster is unable to use Multicast (all hosts and ports are configured
in Unicast configuration).

Has anyone ever experienced this issue? And is there a way to ensure a
uniform distribution of load on the cluster?

Thanks in advance!

--


(Glyton Camilleri) #2

As an update: this issue seems to be related to

http://elasticsearch-users.115913.n3.nabble.com/Document-ALWAYS-Routes-to-ONE-Shard-when-BULK-Loading-HELP-td4022020.html

I have tried using a different ID and i still get the same result using
version 0.19.8.

Any suggestions?

--


(Glyton Camilleri) #3

Fixed this and managed to index all the data. The solution to my problem
was to set a Base64 encoded randomly generated UUID as ID, and then simply
route each document by that value. The load was pretty much uniformly
distributed across all shards.

--


(system) #4