I have a 5 node cluster with 5 primary and 1 replica shard. I have a multi-process system for inserting documents into my index. Each instance of the inserter randomly selects a node from the ES cluster to send bulk insert requests to in order to balance the load between all nodes in the ES cluster. However, despite that it seems like one ES node seems to get stuck with the majority of the work - when I look at _cat/thread_pool/bulk one of the nodes has all its bulk threads active with a large backload of requests in the queue and the other four nodes only have 2 or 3 active bulk threads and no backlog.
are your shards and documents evenly distributed across the cluster? Is each node having two shards and do those shards contain the same number of documents?
Yes, the shards are balanced. My application makes sure that after creating the index that each node gets one primary shard using the reroute API. I have a five node cluster and my indexes are being created with 5 primary shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.