Relevance of having Cordinating Node and Data Node

Hi Team,

I am trying to implement ELK on 6 servers in a cluster. I have below configuration

  1. Two master Nodes
  2. One Coordinating Node
  3. Three Data Nodes

I have inserted around 100 records from logstash in elasticsearch, I have tried few scenarios.

1)If I am using Coordinating node in Kibana.yml(elasticsearch.hosts = Coordinating node URL property) then I am getting all the records

  1. If I am using any Data node in Kibana.yml(elasticsearch.hosts = Data node URL property) then I am getting all the records

  2. If two out of Three Data nodes are down then also I am getting same results.

  3. Even if coordinating node is down , I am getting same results

So I have below queries here.

  1. Is it inserting 100 records in all the three data nodes?

  2. If I am getting Data by setting property in Kibana.yml(elasticsearch.hosts = Data node URL property) as well then what is the use of coordinating node

Elasticsearch works as a cluster, so the data is shared across all (data) nodes that are in that cluster. If the node you talk to doesn't have the data you want, then it will retrieve it from the node that does and then return that to the client.

Coordinating nodes are used to reduce the load on data nodes. They are usually only needed for high volume clusters.

What version are you on?

@warkolm Thanks for your reply, I am using the latest 7.7 version.

Also , you mentioned that it will retrieve from the node that has data(how will it retrieve if other data node is down).

I am just concerned , if there is any case where we are not able to display data

Should we use coordinating node in that case?

The only case to really worry about is if nodes drop off the cluster and you don't have replicas.

@warkolm : I am not creating any replica, I believe ELK must be creating it.

I have done testing with light load(100 records). While inserting the data there were 3 active data nodes in the cluster, after inserting I kept one active and two were down. Even two of the three were down, I was able to display correct and complete data

Can you tell me case when we don't have any replicas?

The only time you won't have a replica is if you have a single node cluster, or you set the replica count to 0 for the index.

This is not good. You should always aim to have at least 3 master eligible nodes in any cluster. Two nodes does not give any high availability as Elasticsearch uses consensus algorithms and require a strict majority of master eligible nodes to be available to function fully.

Just because you can have dedicated node types does not mean that you should. The easiest way to get started is to have 3 nodes that hold data and are master eligible. Unless you expect to significantly expand the cluster this is a configuration that suits a large number of users.