Hey folks,
I know it's a very basic question to ask but I want to clear of certain concepts of mine related to how ES handles a request. So here it goes:
1.) INDEX/UPDATE REQUEST:
When I use a Java(client in any other language for that matter), I put all the data nodes(and client nodes if any) on which my data is allocated in the configuration. Say right now I have three data nodes. So when I send multiple request(single request multiple times or bulk request), the client distributes these requests in a round robin fashion to all the configured nodes. Each nodes decide which shard -> node this document should reside and sends the request to other node. This node now index the document to the shard it contains and once it writes to the shard it send it to the other node on which replica exists. Not sure of whose responsibility is it to send the request to that node containing replica and do the replication stuff.
2.) SEARCH REQUEST:
Again with client sending a request to all the configured nodes, the multiple search request are send to different data nodes on a round robin fashion. Each node parses the query it receives and it acts as the node distributing the request to all shards, gathering, combining the results and sends back the data to client.
Am I right with the concept?
Other questions:
1.) If I have a cluster with say 10 data nodes, do I have to mention all the nodes in the configuration of my client? That means whenever I add a node to cluster I have to change the configuration of my client. What if I use a load balancer of top of my ES nodes and they manage the request distribution to all the nodes of cluster?
2.) Does it mean more the number of nodes(client or data) better would be the performance of the insert/update and read?
3.) Does my java client talk to individual nodes and if yes is it a synchronous operation or an async operation?
Thanks. Basic questions but its better to always to clear of the basic concepts.
Piyush