Hot node


(Eugene Strokin) #1

Hello, I have very strange situation.
One node uses nearly 100% CPU all the time. The cluster is evenly balanced, one shard per each node with one replication. Index has 5 shards, and cluster is 10 nodes. No matter which shard I put on the node, it still uses 100% CPU. All other nodes are using about 40-50% of CPU.
It could be something wrong with the machine I'm running the node on, but before I'll replace it and find out nothing was wrong with it, maybe someone could suggest to check or play with some settings?
I've checked all the apps connecting to the cluster, they have all the nodes listed and sniff switched on. But the hut node receives about twice more requests per seconds (Search time per second, Get time per second, Indexing time per second) when the node which runs the same shard replica. Nothing interesting in the logs.
Thanks in advance for any advice or help,
Eugene


(Mark Walkom) #2

Check hot_threads for starters, then look at your logs for GC.

How are you monitoring things?


(Eugene Strokin) #3

Here it the hot_threads: https://gist.github.com/strokine/654dab22c88929e8f77c
I couldn't see anything what could help here.
Also, I noticed that if I move things around the load on the hot node could go down, but on other node could go up. But mostly, there is one node which is hot. From the stats I see, that number of "Search time per second" is growing on that node when it becomes hot. So, the problem is that the load on the cluster is not even. Some nodes get more requests than another.
I just measure CPU load, which is about 100% using top command or BigDesk plugin, which is using top anyway, as far as I know.
I couldn't find any pattern what do I change and load goes down. It looks like, when I move some over nodes, the hot node get more request.


(Mark Walkom) #4

What do you mean by moving a node?

How are you indexing into ES?


(Eugene Strokin) #5

For example, I have node-1 hot, CPU is 100%. I shutdown one other node. The shard from that node moved to a third node, and suddenly load on the hot node goes down. But since 2 shard on the third node, it become hot.
I could move the shard from the shutdown node around, to different nodes, and I see that it affects the load of the first node as well.
I couldn't see any pattern.
Indexing into ES happens all the time, the number of indexing requests relatively low.


(system) #6