Elasticsearch Zen discovery

dudidu · May 21, 2018, 9:29am

We are having an issue with an issue with Ubuntu kernel causing the machine running elasticsearch to hung.
While the machine in this state (it only solve after restart) no API call to elasticsearch (or even SSH to the server) is returning result, though elasticsearch continues to identify the server as part of the cluster (because the way Zen discovery works). It means that the server not responding while the master does not remove it from the cluster.

Any ideas?

JKhondhu · May 21, 2018, 9:39am

This sounds about right.

@dudidu
How many nodes in this cluster?
Can you share the elasticsearch.yml
It would be interesting to see what the discovery.zen.minimum_master_nodes is set to.

dudidu · May 21, 2018, 9:54am

Total of 78 elasticsearch nodes:

6 clients nodes
3 master nodes
39 data nodes (hot)
30 data nodes (cold)

elasticsearch.yml

cluster.name: tier-1

node.name: prod-elasticsearch-data-038
node.master: false
node.data: true

node.attr.box_type: L8

http.cors.enabled: true

http.cors.allow-origin: "*"

path.data: /mnt
path.logs: /var/log/elasticsearch

bootstrap.memory_lock: true

network.bind_host: ["_site_", "_local_"]

network.publish_host: _eth0:ipv4_

discovery.zen.ping.unicast.hosts: ["prod-elasticsearch-master-001", "prod-elasticsearch-master-002", "prod-elasticsearch-master-003"]
discovery.zen.minimum_master_nodes: 2

action.destructive_requires_name: false

thread_pool.bulk.queue_size: 3000

discovery.zen.minimum_master_nodes: 2

warkolm · May 21, 2018, 9:09pm

What version are you on?
What kernel?
What JVM?

That's likely to cause more problems than it is worth.

dudidu · May 22, 2018, 1:43am

Elasticsearch version: Version: 6.2.4

JVM version: openjdk 1.8.0_171

OS version: Linux prod-elasticsearch-data-010 4.13.0-1011-azure #14-Ubuntu SMP Thu Feb 15 16:15:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

DavidTurner · May 22, 2018, 7:26am

This sounds like it's a known kernel bug: https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1772264

DavidTurner · May 22, 2018, 7:38am

There is no simple solution to this. Nodes that are completely working or completely broken are easy to identify, but in this case AIUI the faulty node is neither: it just repeatedly hangs for a while and then starts working again. Each time this happens it looks to the master node like a transient network issue. There is no "correct" way to deal with this to this, only heuristics, and Elasticsearch's heuristics are quite conservative in their detection of faulty nodes because removing a node and reallocating all its shards can be quite expensive.

dudidu · May 22, 2018, 7:45am

Is there any workaround we can use?

It seems that there is a known issue with ubuntu kernel causing the machine to hung, which currently does not have a fix (A kernel fix should be released in the next couple of weeks).

dudidu · May 22, 2018, 7:51am

Can you please elaborate on why and to what it should change?
Without this value we will get rejected.

DavidTurner · May 22, 2018, 7:58am

Rolling back to a non-buggy kernel version is all I can suggest. I've had a quick look, but can't find details on when this bug was introduced, sorry. The discussion about this kernel issue all seems to be in the last month or so.

DavidTurner · May 22, 2018, 8:00am

These people rolled back to 4.11 which fixed it for them, if that helps.

system · June 19, 2018, 8:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Zen Discovery cannot resolve the master Elasticsearch	3	370	July 6, 2017
Elasticsearch cluster of 4 nodes has "master not discovered exception" Elasticsearch	18	28752	May 18, 2018
Not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again Elasticsearch	4	4365	November 2, 2018
The node does not automatically return to the Elasticsearch cluster Elasticsearch	19	668	November 10, 2020
Data nodes are not able to join master node and failed to make a cluster Elasticsearch	14	2572	October 5, 2018

Elasticsearch Zen discovery

Total of 78 elasticsearch nodes:

elasticsearch.yml

Related topics