In a previous discussion there's a claim that we shouldn't worry about balancing primary shards across all nodes because primaries and replicas do almost the same amount of work during indexing.
I'm having a problem with my bulk index thread queue filling up and tasks being dropped, and the nodes with the most amount of primaries have the most dropped:
The fact that some nodes have a non-empty queue and other nodes are showing no active jobs leads me to think that having such an unbalanced cluster must affect indexing performance. Jobs are waiting in queue on on one node when they could be being processed on another if the primaries were more balanced. Am I wrong here?
Sorry for the delay, I didn't get a notification about your response.
On this current job, yes, I am updating documents. I use the elasticsearch-py bulk API with index commands. However, the _ids in the bulk payload already exist so it results in an update.
If you index them with the index command there should be no difference between primary and replica loads. Could you run hot threads on the busy node and on a non-busy node and send both here?
That's strange. Are you using custom routing, search preferences or scan/scroll searches by any chance? Which version of Elasticsearch is this? Is elasticsearch.yml file on the overloaded node different from other nodes?
There is an index on this cluster with parent/child documents, though it was only doing incremental indexing (~10 docs/minute). There was a much heavier indexing job going on that didn't have any custom routing.
elasticsearch.yml is the same across data nodes.
Also, I didn't know it at the time, but we've since discovered there were network problems with our cloud provider that our cluster is deployed to when I grabbed these hot_threads. Once the network issues resolved, the number of docs in the queues went to zero rather quickly.
Could you rerun hot_thread again and this time could you run them a few times? Let's say 5 times with 40 seconds interval between runs. Just to make sure it wasn't a fluke?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.