Elasticsearch health turns red frequently

Hi,

For a while now I have been dealing with an issue with Elasticsearch's health becoming red and not processing data. I tried to troubleshoot it myself, but I don't think I've made any huge progress in determining the exact cause.

{
  "cluster_name" : "elasticsearch",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 54,
  "active_shards" : 54,
  "relocating_shards" : 0,
  "initializing_shards" : 4,
  "unassigned_shards" : 184,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 22.31404958677686
}

The first thing I noticed was that I have lots of "unassigned_shards", I don't really know what these are to be very honest but I'm pretty sure they shouldn't be unassigned?

I took a look at my indexes by listing them with curl 'localhost:9200/_cat/indices?v' and noticed that I had two red indexes and everything else was yellow. Since the data isn't that important I went ahead and deleted the red indexes.

After deleting the red indexes my Elasticsearch went to yellow and started processing data once again.

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 131,
  "active_shards" : 131,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 131,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

But the next day it went back to red and I noticed having two red indexes once again.

What could be causing this? Does it have something to do with unassigned shards? I already enabled shard allocation and the number went down slightly.

May be you define a number of replicas > 1 or you started multiple nodes with no replicas and stop the nodes?

Hard to tell without more details. Such as _cat/shards?v.

Hi, thanks for your reply.

I ran the command and this is the output:
http://pastebin.com/k2UeJYRa

I haven't manually defined a number of replicas or started multiple nodes with no replica. Would having a second node help as I have one now? We deal with a quite high amount of data.

If there is any more details you need let me know.

Yes that's it. You need to run that when your cluster is RED so you can know which primary shard is actually missing.

If you don't plan to add more nodes (and you don't need replication), you could set the number of replicas to 0 so you won't have unassigned shards.

Note that you have too many shards on a single node. Some of them are pretty much empty. According to the size here, I'd probably set the number of shards to 1 instead of 5 (default value).
It will help to reduce the pressure on your node probably.

1 Like

Alright that makes sense.

I've set the number of replicas to 0 as I don't plan to add more nodes and won't need replication.

curl -XPUT 'localhost:9200/logstash-events*/_settings' -d '
{
    "index" : {
        "number_of_replicas" : 0
    }
}'

How can I set the number of shards on my node? I read about being able to set it during the creation of an index, but can't seem to figure it out for nodes.

Edit: Since I set the number of replicas to 0 I noticed my unassigned_shards went down to 1 from 131.

You can't change the number of shards after the index has been created. You need to reindex for that.
My advice is to define an index template where you set the number of shards to 1 for the next index you are going to create.
After a while (I guess that you are removing old indices at some point), you will have less shards as older indices will be removed.

Hi,

My cluster has turned red once again so I listed all my indices and noticed that two of them are red and everything else is green. Exactly the same as last time, for some reason every day two(most recent) indices fail.

Indices
http://pastebin.com/FaAv2E7h

As you suggested I ran the command to list my shards and this is the output. I left some out that were fine.
http://pastebin.com/MLrSFdRQ

So here we go:

logstash-events-2016.07.13         2 p INITIALIZING                 127.0.0.1 Mister Sensitive
logstash-events-2016.07.13         1 p UNASSIGNED
logstash-events-2016.07.13         3 p UNASSIGNED
logstash-events-2016.07.13         4 p INITIALIZING                 127.0.0.1 Mister Sensitive
logstash-events-2016.07.13         0 p INITIALIZING                 127.0.0.1 Mister Sensitive

For a reason I don't know, you have unassigned shards here.
And some are still initializing. Is it still the case BTW?

You should see some logs on your node telling what is happening.

BTW you did not change the template to define only one shard per index.