ELK restarts fails with exception Status changed from red to red

yeruvass · June 11, 2018, 11:43am

Hi Team,

We are using ELK 2.4.1 and deployed HA mode with 3 instances. Every two hours elasticsearch is restarting and getting below exception observed.
Elasticsearch api:-
{
"name" : "node-10.10.0.12",
"cluster_name" : "ad5be3b4-5f80-5589-b0b2-50fd38592089",
"cluster_uuid" : "_sr-yB7ISF2vE2_DpjhUuA",
"version" : {
"number" : "2.4.1",
"build_hash" : "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
"build_timestamp" : "2016-09-27T18:57:55Z",
"build_snapshot" : false,
"lucene_version" : "5.5.2"
},
"tagline" : "You Know, for Search"
}

Logs :-
May 8 06:52:09 cbnd18-414-1-server kibana: {"type":"log","@timestamp":"2018-05-08T06:52:09Z","tags":["status","plugin:elasticsearch@1.0.0","error"],"pid":618,"state":"red","message":"Status changed from red to red - [master_not_discovered_exception] null","prevState":"red","prevMsg":"Service Unavailable"}
failed ({node-10.10.0.12}{6G60lvenQYKPucRh380elA}{10.10.0.12}{10.10.0.12:9300})
May 8 06:52:20 cbnd18-414-1-server kibana: {"type":"response","@timestamp":"2018-05-08T06:52:20Z","tags":[],"pid":618,"method":"get","statusCode":200,"req":{"url":"/","method":"get","headers":{},"remoteAddress":"10.10.0.13","userAgent":"10.10.0.13"},"res":{"statusCode":200,"responseTime":2,"contentLength":9},"message":"GET / 200 2ms - 9.0B"}
May 8 06:52:21 cbnd18-414-1-server kibana: {"type":"response","@timestamp":"2018-05-08T06:52:21Z","tags":[],"pid":618,"method":"get","statusCode":200,"req":{"url":"/","method":"get","headers":{},"remoteAddress":"10.10.0.12","userAgent":"10.10.0.12"},"res":{"statusCode":200,"responseTime":2,"contentLength":9},"message":"GET / 200 2ms - 9.0B"}
May 8 06:52:22 cbnd18-414-1-server elasticsearch: [2018-05-08 06:52:22,062][WARN ][rest.suppressed ] path: /_bulk, params: {}
May 8 06:52:22 cbnd18-414-1-server elasticsearch: ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]
May 8 06:52:22 cbnd18-414-1-server elasticsearch: at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:158)

On further check I see a github ticket with similar exception https://github.com/elastic/elasticsearch/issues/11202
The below solution is working after making tcp changes. Is this solution is right?

ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]","status":503}
solution:
set the tcp_keepalive_time to a suitable value ( default 7200 seconds) . example change the value to
tcp_keepalive_time=300

Because TCP change will also impact other TCP connections from various components
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 20
net.ipv4.tcp_keepalive_time = 600

Could you please help me is this right solution or do we have any other solution elasticsearch is providing?
Please let me know if this is resolved in any elasticsearch new versions by default?

Thanks in advance.

system · July 9, 2018, 11:43am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch service failing Elasticsearch	4	28342	August 15, 2017
General Guidance on Updating ELK with minimum Fuss Elasticsearch	6	2629	June 14, 2020
Elasticsearch is still initializing the kibana index... Trying again in 2.5 second Elasticsearch	6	753	September 22, 2020
Discover: An error occurred with your request. Reset your inputs and try again Kibana	6	3290	July 6, 2017
Failing cluster Elasticsearch	4	1514	September 23, 2021

ELK restarts fails with exception Status changed from red to red

Related topics