Always green before shutdown, always red after startup

Hi there

We use graylog in front of elasticsearch-2.4.6 and I've noticed an oddity. If "the system" is working well and I run

curl http://ip.add.ress:9200/_cluster/health

I get back '"status":"green",..."unassigned_shards":0'. If I shut down graylog-server (ps shows it's not running) - so that I know there's no data flowing into elasticsearch - curl still shows "green...0". So far, so good.

If I then restart elasticsearch, that always comes up '"status":"red",..."unassigned_shards":3000' - or a similar number in the thousands. If I keep polling with curl, I see that unassigned shard number come down until it hits zero, and then the status turns to "green".

My question is, is that expected behaviour? I am uncomfortable with seeing the "green" change to "red" with a mere restart - it implies there's some form in inconsistency that can only be resolved/recognized through a restart.

Am I worrying about a non-issue?

Thanks

Jason

Elasticsearch recovers those shards on start up, one by one. This means it runs some checks before it is making those shards available for search. 3000 shards on a single system is way too much and just needs some time to check, thus the delay - but this is ok.

Hope this helps.

OK, so does that mean that internally ES has some kind of queue for unprocessed shards, and it just so happened that it's running behind what's coming in, but it's only the formal shutdown/restart that exposes things in this queue to us via the /_cluster/health command? Is there an API call that would expose this backlog? I currently monitor /_cluster/health (which obviously isn't showing this) but would rather have a better option that actually told me a backlog was forming. We are currently on one node and will expand into a cluster - but I'd still like to know when this backlog starts happening, as it tells you something it going to go wrong soon if it only increases over time

Thanks

Jason

There is a queue only start up (based by age of index, and a couple of more factors), on which the shards should be checked. This processig called recovery, you should check out the docs here: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-recovery.html

I do not understand the question about the cluster health startup.

You can configure the order of recovery, see https://www.elastic.co/guide/en/elasticsearch/reference/current/recovery-prioritization.html

Hope this helps.

I don't think I'm explaining myself very well :slight_smile: Let's try this

shutdown graylog (which is the only input into ES)
GET /_cluster/health == green, unassigned_shards:0 (ie "I am a happy ES")
shutdown ES
startup ES
GET /_cluster/health == red, unassigned_shards:3000 (ie "I am unhappy")

In my mind I can't understand how that can be expected behaviour. A formal restart shouldn't change state - and yet it's gone red. It feels like it's equivalent to unmounting a file system and yet it being classified as "dirty" every time you remount it, and doing a fsck. A clean umount should mean no fsck (hope that analogy works :wink:

Jason

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.