Elasticsearch distributed computing


After reading a lot of articles about Elasticsearch, i still don't get how requests are distributed among the different nodes.

I would like to know how Elasticsearch perform the requests distributions. Do the requests are equally distributed ? The master perform more computation than the data node ?

Let's take an example : a cluster of 5 ElasticSearch nodes with 5 indices, each one have one replica and one primary shard.
My first thought would be that requests are equally distributed: a master node send 4 requests to the 4 others nodes (scatter phase) ; these requests are related only to one different indice.
Of course, results are then sent to the master node (gather phase) that send the final result to the application.

Unfortunately, i don't know if that's true or not.

The reason of my question is that I already set up a cluster with 5 nodes. They don't have the same hardware configuration (but have at least 8 GB RAM and 2 CPUs) and i was wondering if the performance will mainly rely on the worst hardware configuration of the machine hosting a node.

Thanks for your explainations :slight_smile:

You might find this useful: https://www.elastic.co/blog/found-elasticsearch-top-down

Any node that receives the request becomes the coordinator. It parses the query and determines the necessary set of shards that need to be searched. It can route the request to the primary or replica shards. Multiple requests to this same node would result in sort of a round robin choice on which shard copy to use.

Great article, thanks a lot!

It answer to my question :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.