In my cluster, there are two kinds of nodes
type A) 32CPU, 60G Mem, 700G SSD, 10G network
type B) 36CPU, 70G Mem, 900G SSD, 10G network
I found when I send queries to the cluster in a faster speed
- type B runs 50 processes, the # of TLB shootdown interrupts goes to 14/s
- type A runs 40 processes, the # of TLB shootdown interrupts goes to 80/s, and Hypervisor callback interrupts goes to 100/s. Type A is more busy.
Based on my understanding of ES query scheduling, query load are distributed to shards on different nodes in a round robin way. In my case, all nodes have similar numbers of shards. So although nodes have different computation ability, they may get the similar work to do?
I also assume queries are sent to nodes concurrently. New queries do not need to wait until old queries are done. Given this, if my cluster over-estimated type A node's power, it could send more tasks.
Is it what happened?