Help with Elasticsearch Cluster


(Jorge Salcedo) #1

I'm trying to create a master cluster to connect to a dedicated coordinate node with dockers,

In the two master nodes("nodeMaster1", "nodeMaster2"]) I have the following:

cluster.name: "docker-cluster"
network.host: 0.0.0.0
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["CoordinatedNode", "nodeMaster1", "nodeMaster2"]
discovery.zen.minimum_master_nodes: 1
xpack.license.self_generated.type: basic

And in the CoordinatedNode:
cluster.name: "docker-cluster"
network.host: 0.0.0.0

node.master: true
node.data: false
discovery.zen.ping.unicast.hosts: ["CoordinatedNode", "nodeMaster1", "nodeMaster2"]
discovery.zen.minimum_master_nodes: 1
xpack.license.self_generated.type: basic

Is this make sense?

Can anybody help me?


(Eduardo González de la Herrán) #2

Hi!

You are using 3 masters really, not 2, as nodeMaster1, nodeMaster2 and CoordinatedNode have node.master: true all of them.

Considering you have 3 masters you should set discovery.zen.minimum_master_nodesto 2.

3 masters and a minimum of 2 for quorum (minimum_master_nodes) is the recommendation and the only way to have real HA in the cluster and avoid split-brain issues, so I would say your configuration is correct (if you change the mentioned setting to 2) although the names you are using are a little bit misleading.

My only concern would be to ensure that the dockers can resolve CoordinatedNode, nodeMaster1and nodeMaster2 to valid IP addresses when running.

If you have all this in a docker compose it will probably be all right.

If you want to avoid the coordinating only node to be a master, then I would suggest to add a third master (master + data actually) and then having 3 master/data + 1 coordinating only.

Regards!
Eduardo


(Jorge Salcedo) #3

Hi Eduardo,

Thanks so much for your response, I really appreciate it, and you concern of the resolving among my dockers is OK, I mean, my dockers can do it.

Regarding your recommendation to avoid the coordinating only node to be master, when you say to get a new master+data actually, is not clear for me, can you give me more information about the configurations, please?

Is there any another better way to get configure the cluster to connect to a load balancer or something like that?

My problem is, I have two elasticsearches, and I want to connect them to a third elasticsearch, but I don't want to keep the information in the third one. For this reason, I think the coordinating only node is the best option for my problem. Is there a better way?


(Eduardo González de la Herrán) #4

Hi Jorge,

In terms of architecture, there's not a single answer, and your proposal and intention is correct. So, you can use it.

Your proposal is:
2 * data/master + 1 * coordinating only / master

When I suggested to add a third data/master I did it just to have a more homogeneous architecture, not to avoid the coordinating only node to be a master. That proposal would be:
3 * data/master + 1 * coordinating only (no master).

Other possibilities:
3 * master/coordinating only + 2 * data.
3 * master/data and nothing else (3 nodes in total with HA, the cheapest HA cluster)

Or even (fully dedicated roles architecture):
3 * master + 2 * data + 1 * coordinating only (note: with 1 coord.node only you might lose part of your service if that node goes down).

For full HA and dedicated roles per node, the minimum would be:
3 * master + 2 * data + 2 * coordinating only

Please take a look to the description of the different roles for the nodes:
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

I would say the priorities are this:

  • If you want HA, then 3 masters is the way to go.
  • Then you have to decide if you want dedicated roles in the nodes (dedicated masters / dedicated data nodes) or mixed (all nodes master + data).
  • Then you have to decide if you want dedicated coordinating only nodes (for example to reduce the load when doing queries).
    • If you use dedicated coordinating only nodes then consider having 2 of them for HA purposes.

All these decisions will depend on your indexing and performance needs as well as your budget :slight_smile:
If your cluster is small with no special needs or huge queries I would go for a plain 3 nodes cluster, without dedicated coordinating only nodes. Then you will have a basic cluster with HA.

About load balancers you can rely on external load balancers, yes. If you use coordinating only nodes you can point your external load balancer to round robin between them.
But in general, if a client accepts a list of elasticsearch hosts in the configuration, and is capable of balancing the connections directly, that would be preferred (for example logstash).

And the latest decision is where to point your clients when you have dedicated roles per node: directly to the data nodes or to the coordinating only nodes. And I believe there’s no good or bad response here either. There are different ways to implement this, like:

  • All clients (producers / consumers) towards coordinating only nodes.
  • Producers towards data nodes while consumers towards coordinating only nodes.
  • All clients towards data nodes (not a good choice if you already have coordinating only nodes available).

Anyway, this is just a surface analysis of the architecture possibilities. Hope it helps.

Regards and good luck with your clusters!
Eduardo


(Eduardo González de la Herrán) #5

Hi again,

I think I have given too much information in my previous post and I wasn't focused on your specific question :slight_smile:

My problem is, I have two elasticsearches, and I want to connect them to a third elasticsearch, but I don't want to keep the information in the third one. For this reason, I think the coordinating only node is the best option for my problem. Is there a better way?

Before being able to answer.... what's the purpose of that third node and why you don't want the data in that one? With that we will see if there's a better way or if that's the right way to go.

Regards!
Eduardo


(Jorge Salcedo) #6

Hi again Eduardo,

Regarding with your concern, let me explain what do I need.

I need to find a fit arquitecture for the following,

I'm using LOGSTASH to send information to different ELASTICSEARH nodes, where each ES node belongs to different clients that we are monitoring. The idea of this is to avoid data merge among the different clients. I want to get a last ES node to handle the information from the cluster of my clients with the purpose to display it in KIBANA, I mean to get the different indexes in only one kibana instance to perform queries and create dashboards. Also in last ES node I would prefer to not keep the information from the clients. I hope this make sense you, even if you can help me with some recommendations to implement it, I will be really grateful to you.

Please, let me know, if my above explanation was clear.

I'm soo excited to do this in ELK suite, the suite is very interesting, however I'm new in the technology, so there are a lot of thing that I unknow, also I would like to know, if there is another way to contact you in order to avoid large posts in forum.

Regards


(Eduardo González de la Herrán) #7

Hi,

First of all take a quick look at this, and read about cluster, node and index concepts.
https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

I'm using LOGSTASH to send information to different ELASTICSEARH nodes, where each ES node belongs to different clients that we are monitoring. The idea of this is to avoid data merge among the different clients.

Specific nodes per client/logstash instance are not needed. When you have 2 (or N) nodes in a cluster, the 2 nodes are going to work together and store the information in a distributed way, providing high availability. If you really want to have completely separate data stores, maybe you should create 2 different clusters (but i don't think you need or want that).
When you have 1 cluster with 2 or 20 nodes, it's irrelevant to what node you send the data from logstash, the information will be stored by the cluster wherever the cluster decides (you can administer that, of course, but that's an advanced topic), not related with logstash connection.

I'm assuming now you will have 1 cluster with 2 data nodes.

If you want to avoid data merging you have 2 options:

  • From each logstash/client send information to a different index, so you will have a different index per logstash, and the information won't be merged.

  • From all logstashes send the information to the same index, but in each logstash add a differentiator field, for example client_id: "client1" in logstash1 and client_id: "client2" in your second logstash. If the type of the information is the same in all logstashes this would be my preferred option. With this option, your information from all clients will be merged in the same index, but it will be very easy to create visualisations for different clients or even do calculations from the totals.

I want to get a last ES node to handle the information from the cluster of my clients with the purpose to display it in KIBANA. I mean to get the different indexes in only one kibana instance to perform queries and create dashboards.

That makes perfect sense. A node in the cluster dedicated for searches by kibana.

So, my conclussions here are:

  • Maybe you don't need 2 data nodes, as there's no need to point each logstash to a different data node. You should want 2 data nodes in the cluster if you want high availability or if your indexing need is high. And maybe you need even more data nodes. But NOT because they are going to handle data from a different logstash.
  • If you want some basic HA, then your original proposal was GOOD and enough. 1 cluster with 3 nodes, 2 of them data nodes and 1 of them to be used by Kibana (and without data role). But as you have a cluster with 3 nodes, then configure the 3 of them also as master nodes.
  • As last comment, if you follow your original approach, then configure each logstash to send data to BOTH elasticsearch data nodes. There's no benefit on configuring 1 data node in each logstash if you have 2 available. When you configure multiple hosts in elasticsearch output, logstash will balance the data between all available nodes.
  • If you realize that you don't want HA, then you can create a cluster with only one node (data/master), point both logstashes to it and point kibana to it also. For proof of concepts or tests, this should be all right too.

Good luck and feel free to raise any other question!
Eduardo


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.