I have 2 master node , 2 data node and 1 client node
I have some queries regrading the node distribution .
- Does client node help in indexing the data into data node ?
- Is it a good practice that data should send to master node for indexing ? (currently I am sending my data from logstash to dedicated master node ) . if not shall I send my data directly to data node ?
As per the doc in https://www.elastic.co/guide/en/elasticsearch/reference/1.4/modules-node.html
"By reducing the amount of resource intensive work that these nodes do (in other words, do not send index or search requests to these dedicated master nodes), we greatly reduce the chance of cluster instability"
It is not a good practice to send indexing data to master node.
I don't think client node help in indexing data.
1 - Yes
2 - No, send it to the client node.
Also having two master nodes is kind of pointless, you should have 3 if you want dedicated ones. However given the size of your cluster I wouldn't really bother with them at all.
How does client node help in indexing the data? Can you please elaborate on this. It would help me to understand stuff better. As per the docs I have read it assists in search and aggs operation.
If you send indexing to it it will just reroute the request to whichever node has the shard for the document.
You do run a chance of OOMing your master node by sending indexing requests through them, so if you have a client node you might as well use it!
Another question somewhat related to this
As each node knows which node to route the request to... would it be still beneficial to have a load balancer in front of your cluster say of 3 nodes routing the requests to each node in round-robin fashion ?
Or would it be be good design to have one client node with 2 data nodes in cluster.
or there would be no need of balancer at all in 3 node cluster?
An LB can still be useful if a node goes down (restart, whatever).