We have a 19 node hot/warm/master cluster. Logstash is pointed at only the hot nodes.
Recently we are seeing that no matter which hot node gets the actual bulk request, only a single node is handling the request, usually not the node that received the request. If we restart the single node handling the requests, this behavior just moves to a different node, but still only that node handles the bulk requests.
We have verified that shards (primary and replicas) are distributed across all nodes properly, with no hot spots. We have verified that the load balancing of logstash to multiple hot nodes is occurring by reviewing the network traffic as well.
Our indices have settings for routing to ensure they only go to nodes marked as hot, and we've verified that all the hot nodes do in fact still have their hot tagging.
But we cannot determine why all the requests are still being handled by only a single node.
How do you monitor that one node handles all requests? How many hot nodes do you have? How many indices are you actively indexing into? How many primary shards do these indices have?
We are monitoring the bulk queue in threadpool, along with the active threads. Both are only increasing or even at normal levels on the one node.
6 hot nodes
Indexing into ~250 indices, set as dailies. They vary on number of primary shards (we push to keep any single shard under 40GB), and all have 1 replica.
It sounds to me like you have far too many indices. 250 daily indices sounds excessive. I would recommend that you read this blog post if you have not already.
It sounds like your indices vary in size quite a lot. What is the chance you have a hot index? Have you looked at indexing statistics per index and mapped this to nodes?
We only keep around 4000 indices total open across hot and warm. In hot, we only keep about 750 indices total across 6 nodes, and the hot nodes are very large systems ( 16 cpu, 64GB memory, 30GB heap and nvme disks). We've done a lot of tuning as well. The rest of the indices are closed to ensure the overhead stays down. We are well aware of the limits of indices in the cluster.
This is a new issue we've never seen before, and we've been running for a couple years at this scale and larger.
Do you use dynamic mappings for any of the indices? Is it possible some index updates mappings frequently (which could take a while for a cluster that size with that many indices and shards) causing the bulk queue to build up just there?
We are using the AWS EC2 zen discovery, and it appears that the node handling all the bulk requests is always in Availability Zone A or B. If the node in A is handling all requests, we can kill it, and all the requests now get handled by the node in B. If we then kill the node in B, they all go back to node A.
Is it possible the EC2 zen discovery has some control/policy on how the bulk requests are handled?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.