3 nodes in the cluster, 2 data and 1 master - why if one fails it takes the whole cluster down?

Hello,

I have 3 nodes; 2 data nodes and 1 master node. When one of the data nodes fails, the whole cluster stops processing data, nothing is shown in Kibana anymore, even though a data node is working and receiving data as well.

Is this normal?

2 Likes

If you have a single master and it goes offline writes will no longer be allowed. If you make your two data nodes master eligible as well so you have 3master eligible nodes it should be able to handle one if the nodes going down without blocking writes.

In my question, I said that when a data node fails while the master node and the other data node are still working, I still lose everything - nothing gets written to Elasticsearch and I cannot view anything in Kibana.

3 days ago, a data node failed, yesterday I checked Kibana and noticed there is no data for 2 days. I checked all the nodes: Master node is still running, Data node 1 is still running, Data node 2 Elasticsearch had crashed.

In theory, I should still be able to see the data in Kibana coming from Data node 1, but not the data from Data node 2.

Why was this not so?

Are your indices configured to have a replica shard or do you have indices in red state?

All green (but was yellow yesterday due to 1 data node failure), I haven't configured anything special. The way it works is as follow:

  • 1 data node for client A syslogs going into client-A-index-xxxx.xx.xx
  • 1 data node for client B syslogs going into client-B-index-xxxx.xx.xx
  • 1 Master node

I visualise everything in Kibana with index pattern client-*

Data node for Client B crashed. In theory I should still see stuff coming from Client A's index.

If you want a resilient three-node cluster you should follow the advice in the reference manual under Designing for resilience -> Three-node clusters, noting in particular:

If you have three nodes, we recommend they all be data nodes ... You should also configure each node to be master-eligible ... You should avoid sending client requests to just one of your nodes ... You can do this by specifying the address of multiple nodes when configuring your client to connect to your cluster.

Still doesn't make much sense to me.

  • Master node is totally fine.

  • Data node 1 totally fine and receiving event logs.

  • Data node 2 down.

But I cannot see anything in Kibana (which points to the Master node). I should be seeing whatever goes into Data node 1.

The advice in the reference manual is not relevant, since my Master node is fine and one of the Data nodes is fine too.

What is in the logs? What is the output of the cluster stats api?

I don't think you should dismiss useful information as irrelevant so readily - I even quoted some fragments from the manual to save you from clicking through and reading the whole thing to work out which bits are the most important. A fragile master node is going to bite you in future, and the manual (and my post) highlight a bunch of other concerns that seem very related to your present problems.

Which node is your Kibana pointing at? Is it only pointing the down node?

Also the advice both @DavidTurner and @Christian_Dahlqvist is very important to a healthy / resilient cluster.

It's pointing to the Master node - which is healthy.

Data node 1 is receiving data from Client A no problem.

But when Data node 2's Elasticsearch crashes, I don't get anything from any node anymore.

But my Master node is not fragile, it is fine. The Data Node 2 is the one who's busy and crashes sometimes, but it should not prevent me from seeing the index that Data Node 1 produces.

Even if I followed all the recommended steps, it doesn't answer my question: Master is good, Data node 1 is good and produce its own indexes, Data node 2 is down. Why can't I see Data node 1's data in Kibana (who connects to the Master node).

Here's a topology:

If Data Node 2 Elasticsearch fails, please confirm this: Should I see my Client A data still? Or is there something that the Master Node does that depends on Node 2 to be up as well? You know, like RAID and parity? Does the Master node move data around in some kind of parity way?

What does your logstash pipelines and outputs configuration looks like?

Depending on how is your pipeline and outputs configured, logstash could be blocking the output.

This block would happen if you have something like this in your pipeline:

output {
  elasticsearch {
    host => ["datanode1"]
  }
  elasticsearch {
    host => ["datanode2"]
  }
}

If you have a config similar to this one, both outputs would be down if one of the data nodes went down.

Can you share your pipelines.

My topology above shows that each logstash sends to its own Elasticsearch. Logstash 1 sends to Data Node 1, Logstash 2 sends to Data Node 2.

When visualising the data in Kibana, I should still see the data from Data Node 1 even if Data Node 2 is down. But for some reason, if one Data Node is down, I don't see anything.

You are right, since the logstash pipelines are completely independent, you should still see data from one data node if the other data node was down.

Which version are you using? Share your elasticsearch.yml for your nodes to make it easy to try to reproduce your issue.

Also, did you look at the elasticsearch logs to see if there is some error there?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.