Hi Jorge,
In terms of architecture, there's not a single answer, and your proposal and intention is correct. So, you can use it.
Your proposal is:
2 * data/master
+ 1 * coordinating only / master
When I suggested to add a third data/master I did it just to have a more homogeneous architecture, not to avoid the coordinating only
node to be a master. That proposal would be:
3 * data/master
+ 1 * coordinating only
(no master).
Other possibilities:
3 * master/coordinating only
+ 2 * data.
3 * master/data
and nothing else (3 nodes in total with HA, the cheapest HA cluster)
Or even (fully dedicated roles architecture):
3 * master
+ 2 * data
+ 1 * coordinating only
(note: with 1 coord.node only you might lose part of your service if that node goes down).
For full HA and dedicated roles per node, the minimum would be:
3 * master
+ 2 * data + 2 * coordinating only
Please take a look to the description of the different roles for the nodes:
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html
I would say the priorities are this:
- If you want HA, then 3 masters is the way to go.
- Then you have to decide if you want dedicated roles in the nodes (dedicated masters / dedicated data nodes) or mixed (all nodes master + data).
- Then you have to decide if you want dedicated coordinating only nodes (for example to reduce the load when doing queries).
- If you use dedicated coordinating only nodes then consider having 2 of them for HA purposes.
All these decisions will depend on your indexing and performance needs as well as your budget
If your cluster is small with no special needs or huge queries I would go for a plain 3 nodes cluster, without dedicated coordinating only
nodes. Then you will have a basic cluster with HA.
About load balancers you can rely on external load balancers, yes. If you use coordinating only
nodes you can point your external load balancer to round robin between them.
But in general, if a client accepts a list of elasticsearch hosts
in the configuration, and is capable of balancing the connections directly, that would be preferred (for example logstash).
And the latest decision is where to point your clients when you have dedicated roles per node: directly to the data
nodes or to the coordinating only
nodes. And I believe there’s no good or bad response here either. There are different ways to implement this, like:
- All clients (producers / consumers) towards
coordinating only
nodes.
- Producers towards
data
nodes while consumers towards coordinating only
nodes.
- All clients towards
data
nodes (not a good choice if you already have coordinating only
nodes available).
Anyway, this is just a surface analysis of the architecture possibilities. Hope it helps.
Regards and good luck with your clusters!
Eduardo