We have a cluster with 10 machines with 10 data nodes, 3 master nodes and 10 replicated shards.
Sometimes, we run many percolations and searches, and the CPU usages raises on all nodes, but not evenly. 1 node rises to 100% CPU usages while all the others rises to 50-80% of CPU usage. This makes most searches really slow (~20-25s).
Can I determine why this node takes 100% CPU time but not the others?
Can I do something to distribute the load more evenly?
Do you use custom routing of docs? This may account for uneven-ness.
This stands out as a possible culprit. If you have a particularly nasty percolator query that might be the cause of the imbalance.
The best way to start is to use the hot threads API to see what's going on while under CPU pressure:
It fluctuates highly. Most of the time it shows only 3 hot threads (on a hardware with 4 hyper-threaded CPUs with ES taking 800% of CPU time).
Shouldn't it show me 8 hot threads ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.