Before Query Nodes: 4Hrs Down time
I have applications pointing to data nodes earlier. And it always runs into bad queries piled up and then gc, http on that host doesn't respond for cluster host. I had to kill pid of ES and then start again. By this I face shard distributions to allocate to other nodes. And takes 4hrs down time for it to allocate and get the cluster back to response.
After Query Nodes: 15mins Down time
After adding query nodes and applications pointing to query nodes, though I see GC happening how I can push shard allocation to 'none' and restart ES and allocate back to 'all' . By this the minimal shards allocations or initializing completes in 15 mins.
I have 2 questions:
a). Does the query nodes and data nodes communicate via java transport layer internally and that is why I'm able to send http to turn off and on shard allocation?
b). How does query node send the thread pooling to data nodes? Is it single thread sent and wait for its response and then send the next one? How does the threadpooling works with query nodes to data nodes for read and writes.
Regards,
Mannoj Kumar