How did we get into a yellow state and how do we get out of there?

tkroehling · November 24, 2010, 2:37pm

Hi,
we have a cluster of 3 search servers on Amazon EC2s and are feeding
it extensively with documents.
We have a configuration with 3 replicas and 5 shards.
At startup everything was fine. But now when I look at the health of
the cluster I see a state yellow and some unassigned shards:
{
"cluster_name" : "warp",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 3,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10
}
I cannot find any log messages about any problems that may have
occurred or any actions taken in order to get back to a cluster state
green. Log level is "info".
Is there anything I need to do to get back to state green? Or is there
hope that the system heals itself? Is there a way to get more
information about what the server is doing, like trying to heal
itself? Well, I guess I could switch the log level to debug. But maybe
something more specific.
Or is there maybe something wrong with my configuration? Do I need to
configure more shards?
We currently have almost a million documents (which is not really a
lot), taking up about 10GB, but we have only imported about 10% of the
expected data.
Any help appreciated.
Thanks,
Tom

Clinton_Gormley · November 24, 2010, 2:51pm

Hi

we have a cluster of 3 search servers on Amazon EC2s and are feeding
it extensively with documents.
We have a configuration with 3 replicas and 5 shards.

Note, 3 replicas = 4 nodes (1 master + 3 replicas). You are only
running 3 data nodes, so 1 replica is unassigned.

Green = all shards are assigned
Yellow = all master shards are live
Red = all master shards are not live

So yellow is the normal situation in your setup.

If you decrease the number of replicas to 2, then your cluster will
become green.

http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/indices/update_settings/

clint

kimchy · November 24, 2010, 2:51pm

Hey, if you specify 3 replicas, this means you will have basically need to
have 4 data nodes to get to green state. This is for the simple reason that
a shard and its replicas are not allocated on the same node, and the total
number of shard (1) and its replicas (3) is 4.

-shay.baon

On Wed, Nov 24, 2010 at 4:37 PM, tkroehling
thomas.kroehling@coremedia.comwrote:

Hi,
we have a cluster of 3 search servers on Amazon EC2s and are feeding
it extensively with documents.
We have a configuration with 3 replicas and 5 shards.
At startup everything was fine. But now when I look at the health of
the cluster I see a state yellow and some unassigned shards:
{
"cluster_name" : "warp",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 5,
"number_of_data_nodes" : 3,
"active_primary_shards" : 10,
"active_shards" : 30,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10
}
I cannot find any log messages about any problems that may have
occurred or any actions taken in order to get back to a cluster state
green. Log level is "info".
Is there anything I need to do to get back to state green? Or is there
hope that the system heals itself? Is there a way to get more
information about what the server is doing, like trying to heal
itself? Well, I guess I could switch the log level to debug. But maybe
something more specific.
Or is there maybe something wrong with my configuration? Do I need to
configure more shards?
We currently have almost a million documents (which is not really a
lot), taking up about 10GB, but we have only imported about 10% of the
expected data.
Any help appreciated.
Thanks,
Tom

Clinton_Gormley · November 24, 2010, 2:52pm

bad english

Red = all master shards are not live

Red = not all master shards are live

clint

tkroehling · November 24, 2010, 2:58pm

That explains everything.
I thought the total number of replicas would include the master.
I will change my configuration. 2 replicas will be sufficient in my
setup.
Thanks a lot.
Tom

kimchy · November 24, 2010, 3:06pm

You can change that at runtime (the number of replicas), check this API:
http://www.elasticsearch.com/docs/elasticsearch/rest_api/admin/indices/update_settings/

On Wed, Nov 24, 2010 at 4:58 PM, tkroehling
thomas.kroehling@coremedia.comwrote:

That explains everything.
I thought the total number of replicas would include the master.
I will change my configuration. 2 replicas will be sufficient in my
setup.
Thanks a lot.
Tom

tkroehling · November 24, 2010, 3:43pm

That's exactly what I did to reconfigure it and it worked well.
Everything is up and running.
And the cluster also recovers now without any problems from manual
server crashes in our tests.

Thanks again,
Tom