ElasticSearch : observer: timeout notification from cluster service

Deb · October 24, 2015, 7:31am

I have a ElasticSearch Cluster with 3 Data Master Nodes, one dedicated Client Node & a logstash sending events to Elasticsearch Cluster via the elasticsearch client node.

The Client is not able to connect to the cluster and seeing the below errors in log:-

[2015-10-24 00:18:29,657][DEBUG][action.admin.indices.create] [ESClient] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
[2015-10-24 00:18:30,743][DEBUG][action.admin.indices.create] [ESClient] no known master node, scheduling a retry

I have gone through this stackoverflow answer but it is not working for me. My Master-Data node's elastic search config looks like below:-

cluster.name: elasticsearch
node.name: "ESMasterData1"
node.master: true
node.data: true
index.number_of_shards: 7
index.number_of_replicas: 1
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["es-master3:9300", "es-client:9300", "es-master2:9300", "es-master1:9300"]
cloud.aws.access_key: AK
cloud.aws.secret_key: J0

My Client Config looks like below:-

cluster.name: elasticsearch
node.name: "ESClient"
node.master: false
node.data: false
index.number_of_shards: 7
index.number_of_replicas: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["es-master1:9300", "es-master2:9300", "es-master3:9300", "kibana:9300"]
bootstrap.mlockall: true
cloud.aws.access_key: AK
cloud.aws.secret_key: J0

Logstash Output looks like below:

elasticsearch {
      index => "j-%{env}-%{app}-%{iver}-%{[@metadata][app_log_time]}"
      cluster => "elasticsearch"
      host => "es-client"
      port => "9300"
      protocol => "transport"
    }

I have tried the following things without luck :-

JVM Heap Memory has been set to 30 GB in all the ES Nodes
mlockall set to true in all the nodes
Telnet is working fine from ES Client Node to ES Master-Data nodes on port 9300.
I have also verified TCP & UDP is enabled between the client & data-master machine by using iperf.
The three ES Master-Data nodes are able to talk to each other & the cluster status is reported as green when queried via one of the ES Master-Data Node but the query fails with MasterNotFoundException when queried via the ES Client machine.
None of the machines are in AWS.

Environment:-

ElasticSearch 1.7.1
OS - Debian 7

Can some one let me know what is going wrong or how can I debug this?

lwintergerst · October 26, 2015, 4:17pm

I dont know the solution, but there are a few things you could try:

what is kibana:9300? Is this a node? this seems wrong

add the node itself to its own unicast host config

Christian_Dahlqvist · October 26, 2015, 4:47pm

What does the cluster health look like? Do you by any chance have a very large number of shards as you have 7 shards and 1 replica as default for every index and generate index names based on a significant number of parameters in your Logstash configuration?

Deb · October 26, 2015, 4:51pm

{
  "cluster_name": "elasticsearch",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 4,
  "number_of_data_nodes": 3,
  "active_primary_shards": 43,
  "active_shards": 86,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0
}

Stopping all nodes in the cluster and then starting them one by one solves the problem for some time then after some time the problem repeats again.

Christian_Dahlqvist · October 26, 2015, 4:58pm

That is a very manageable number of shards and all looks good. What have you got minimum master nodes set to? Are there any error messages in the logs apart from what you listed?

Deb · October 26, 2015, 5:04pm

I see the below error :-

[2015-10-26 01:17:23,846][INFO ][discovery.zen            ] [ESMasterData1] failed to send join request to master [[ESMasterData2][HqgkEYtdTwS4Q6SnxGFh4g][es-master2][inet[/172.16.84.218:9300]]{master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]]

and sometimes I see the long gc warning:-

[2015-10-26 04:14:32,355][WARN ][monitor.jvm              ] [ESMasterData1] [gc][old][8430][4] duration [53.8s], collections [1]/[54.3s], total [53.8s]/[54.3s], memory [24.2gb]->[23.8gb]/[29.9gb], all_pools {[young] [12.8mb]->[17.4mb]/[665.6mb]}{[survivor] [83.1mb]->[0b]/[83.1mb]}{[old] [24.1gb]->[23.8gb]/[29.1gb]}

Minimum Master Nodes - 2 .

Should moving to dedicated master node (rather than having master-data node) will help me?

lwintergerst · October 26, 2015, 5:13pm

There is your problem!

Your data nodes are busy collecting garbage and therefore can't answer the join request in time (default 30s i think)

dedicated master nodes will solve this.

You dont need extra servers for this. You can run multiple instances on one server

Deb · October 26, 2015, 5:25pm

But the long gc warnings are very intermittent. How often are the join requests sent?

Is running multiple instance in a single server good practice? Also I am having a Master-Data Node configuration and there I am seeing the long gc warning sometimes. How will the dedicated master and dedicated data instance in a single node solve the issue?

lwintergerst · October 26, 2015, 5:45pm

While a gc is running, the node is 'dead'. It cant do anything.

A dedicated master node will have its own JVM and therefore wont be affected by the gc of the data jvm

We are running two instances on one node. This is also recommended if you used servers with more than 64GB of RAM

Deb · October 27, 2015, 5:27am

Thanks Luca .

Topic		Replies	Views
Elasticsearch observer: timeout notification from cluster service. timeout setting [1m], time since start [1m] Elasticsearch	1	1184	July 5, 2017
Observer: timeout notification from cluster service. timeout setting [1m], time since start [1m] Elasticsearch	6	11128	July 5, 2017
Cluster issue -> raiseTimeoutFailure Elasticsearch	2	407	July 6, 2017
Elastic search getting timeout Elasticsearch	13	1381	February 2, 2017
Message "timeout notification from cluster service" Elasticsearch	3	865	April 18, 2017

ElasticSearch : observer: timeout notification from cluster service

Related topics