ES health api returns green on one node but yellow on another

T_Vinod_Gupta · December 5, 2012, 12:18am

what could be the reason? i dont see any specific problems from the logs..

one node says this -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 42,
"active_shards" : 42,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 41
}

on the other node, it says -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 42,
"active_shards" : 84,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

thanks

--

Ivan · December 5, 2012, 12:46am

The number of nodes is different on each nodes, which means you probably
have two clusters. Use the cluster API to find out which nodes each node is
connected to.

http://localhost:9200/_cluster/nodes

On Tue, Dec 4, 2012 at 4:18 PM, T Vinod Gupta tvinod@readypulse.com wrote:

what could be the reason? i dont see any specific problems from the logs..

one node says this -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 42,
"active_shards" : 42,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 41
}

on the other node, it says -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 42,
"active_shards" : 84,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

thanks

--

--

T_Vinod_Gupta · December 5, 2012, 12:54am

i only have 1 cluster.. i did run this api on both machines.. the green
machine see all 3 machines but the yellow node doesnt see the node that is
showing green..
so it is some sort of communication failure..

i do see this message in the log -
[2012-12-03 06:36:03,426][WARN ][discovery.ec2 ] [Mister
Machine] master should not receive new cluster state from [[Phimster,
Ellie][Y6I9PXqYSeGkPpdZIDurVQ][inet[/10.6.14.94:9300]]]

mister machine is the yellow machine and phimster,ellie is the green
machine.

thanks

On Tue, Dec 4, 2012 at 4:46 PM, Ivan Brusic ivan@brusic.com wrote:

The number of nodes is different on each nodes, which means you probably
have two clusters. Use the cluster API to find out which nodes each node is
connected to.

http://localhost:9200/_cluster/nodes

On Tue, Dec 4, 2012 at 4:18 PM, T Vinod Gupta tvinod@readypulse.comwrote:

what could be the reason? i dont see any specific problems from the logs..

one node says this -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 42,
"active_shards" : 42,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 41
}

on the other node, it says -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 42,
"active_shards" : 84,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

thanks

--

--

--

mvg · December 5, 2012, 12:15pm

With what kind of network are the nodes connected to each other? If
the network is 'slow' you might want to increase the
discovery.zen.ping.timeout to a higher value (default is 3s).

What also could be the reason is that a node doesn't see another node
is that a machine is running low on system resources, like memory. For
that reason that node can't properly respond to ping requests. Can you
check with the node stats api (http://localhost:9200/_nodes/stats?jvm)
if all your nodes have sufficient resources?

Did you also configured the 'discovery.zen.minimum_master_nodes'
option? This can prevent the situation that a cluster has more than
one master (also known as split brain and can be a very annoying
situation). This number should be set to number_of_nodes / 2 + 1 and
in your case this should be set to 2.

From the log line you shared with us, it seems that you more than one
master in your cluster.

Martijn

On 5 December 2012 01:54, T Vinod Gupta tvinod@readypulse.com wrote:

i only have 1 cluster.. i did run this api on both machines.. the green
machine see all 3 machines but the yellow node doesnt see the node that is
showing green..
so it is some sort of communication failure..

i do see this message in the log -
[2012-12-03 06:36:03,426][WARN ][discovery.ec2 ] [Mister Machine]
master should not receive new cluster state from [[Phimster,
Ellie][Y6I9PXqYSeGkPpdZIDurVQ][inet[/10.6.14.94:9300]]]

mister machine is the yellow machine and phimster,ellie is the green
machine.

thanks

On Tue, Dec 4, 2012 at 4:46 PM, Ivan Brusic ivan@brusic.com wrote:

The number of nodes is different on each nodes, which means you probably
have two clusters. Use the cluster API to find out which nodes each node is
connected to.

http://localhost:9200/_cluster/nodes

On Tue, Dec 4, 2012 at 4:18 PM, T Vinod Gupta tvinod@readypulse.com
wrote:

what could be the reason? i dont see any specific problems from the
logs..

one node says this -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 42,
"active_shards" : 42,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 41
}

on the other node, it says -
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 2,
"active_primary_shards" : 42,
"active_shards" : 84,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}

thanks

--

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--