Hi,
I'm seeing an unusual load pattern on one of our logging clusters and am struggling to deal with it...
Basically, I'm seeing a load average of 20+ on one of the data nodes. The other nodes are in their normal range. If I restart the node, the load moves to another node. I have tried shutting down all of the data nodes and masters, and restarting, with no change.
Looking at hot_threads
I see search requests at the top of the list. However, I don't know if these are the actual cause of the problem or whether they are suffering as a result of the high load. Also, looking at the tasks on this node I see they're running for 1-3 seconds so there's no time for me to intervene.
What I'd like to able to do is find what it is that is causing this and kill it. The impact on this node is leading to ingestion more generally bogging down...
Thx
D