Cluster High Availability

Michele_Johl · January 31, 2020, 1:58pm

We had an incident today with Elastic and High Availability was a total failure for 4 minutes when 1 of the nodes died, causing rather large issues on our end as all search queries failed for any indexes with the primary shard on the node that failed.

The cluster has 4 data nodes, 2 query nodes.
discovery.zen.minimum_master_nodes is set to 2 and all nodes are node.master true.

All shards have 1 replica, so when 1 data node died, why did all search queries fail for any indexes where the primary shard was on the affected node?

DavidTurner · January 31, 2020, 2:17pm

Do you mean you have 6 master-eligible nodes? If so, you must set discovery.zen.minimum_master_nodes: 4. Your cluster is at serious risk of data loss and all sorts of other weird issues if you set it too low.

The difference between primary and replica is not relevant to searches; moreover one replica will immediately be promoted to primary when the node holding the current primary leaves the cluster.

Apart from that, there's not much to go on here. Can you share more details?

Michele_Johl · February 3, 2020, 11:15am

Thank you David.

I can share more details, just let me know what you require.

Also, just spoke to Max Bashlawi and we are now looking at getting someone in to consult and review our infrastructure / config as we feel it needs to be fine tuned more accurately.

Just did a RTFM and found what you are referring to: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/discovery-settings.html#minimum_master_nodes

(master_eligible_nodes / 2) + 1

I will revise our clusters immediately.

DavidTurner · February 3, 2020, 11:41am

Sorry I forgot to add a link to the relevant manual page, but yes the one you linked is the right one.

I'm sure the consulting team will do a fine job, and it's probably best for me to leave further investigation to them. Their job will be a good deal easier if you can share with them all the relevant logs and any other evidence you can gather from the time of your outage.

system · March 2, 2020, 11:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How high availability of elasticsearch cluster? Elasticsearch	3	465	January 18, 2019
Elasticsearch Index shards per nodes Elasticsearch	13	1335	October 5, 2020
Elasticsearch 4 node cluster distributed in 2 rooms Elasticsearch	2	455	April 8, 2017
Multi node fail Elasticsearch	16	905	September 8, 2020
Two nodes cluster in 2.1.1 fails for high availability Elasticsearch	8	1191	July 5, 2017

Cluster High Availability

Related topics