What resource requirements should one consider when setting up tribe nodes?
How much memory should a tribe node require?
How many clusters can a tribe reasonably connect to before resources are likely to be strained?
Closest reference I've found ... How much Java Heap Space for Client Nodes suggests tribe nodes (being client nodes) should get up to 32GB of heap. How many clusters could such a tribe node support? Will performance of a tribe node with more heap suffer for same OOPS reasons?
Thanks for any help you can offer!
If you think about what a tribe node does when handling search requests it is fusing results from multiple clusters. Ordinarily a client node is fusing results from the N shards in just one cluster.
A tribe node is a client that fuses results from shards in multiple clusters. Let's imagine there are 2 clusters with 10 shards each. The tribe node fuses 20 responses from the shards - it doesn't fuse 2 responses that have been pre-fused by client nodes in each of the clusters.
So the question becomes what is the overhead of fusing results? That depends on your requests. If you are doing simple top-10 matching docs searches it may not amount to a lot of state/computation to fuse the results but if you are running some complex large aggregation with scripted map-reduce functions it could be a different story.
We have a use case where we have 20+TB of data and we are planning to split the main ES cluster into multiple small clusters. But to query data from these clusters we need tribe nodes, but from the above I see tribe nodes might become a bottleneck with such huge data. Is there a way that the tribe nodes directly talk to client nodes in the federated clusters?