Courier Fetch: 33 of 893 shards failed

Hello,

I'm running a small ElasticSearch cluster (2 data nodes, 1 master node with no data) in which I inject logs with Logstash. I use Kibana 4.0.1.
I got daily indexes with 3 shards and 1 replica.

All was fine until yesterday when I changed an ElasticSearch parameter. I changed index.merge.scheduler.max_thread_count: 1, after having read https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html and because my servers use spinning hard disks.

I changed this parameter on my 3 servers and since, Kibana reports the error "Courier Fetch: 33 of 893 shards failed." when I want to see recent logs.

My problem is: I got no errors on my elasticsearch logs so I don't know which indexes fails and can't try to repair them.

Anybody knows how to solve this problem, or how to find which indexes are failing?

Thanks

(by the way, I removed this new parameter on ES since I got this problem)

It means that some of your shards are not available.

Take a look at things like the _cat/allocation, _cat/shards and _cat/recovery APIs to see what is happening.

Thanks, _cat/allocation does not seems to reveal any interesting information.
_cat/shards lists all my indexes and its shards indicating if it's a primary or a replica, thay are all at state STARTED.

Looking at _cat/recovery it seems that some shards are in the middle of a 'transit' between two nodes. (as indicated at https://www.elastic.co/guide/en/elasticsearch/reference/1.4/cat-recovery.html as I'm running 1.4). It looks like

index-2015.03.09 2     6154  replica done  <data1> <data2> n/a        n/a      194   0.5%          271085265  0.0%

Percent values are often 0 or 100% but some got 25.0% or 0.5%, 0.9%.

I'll try to find how to repair this.

Ok it could be something else, take a look in your ES logs as well.

They don't reveal anything interesting. I got the same errors in the past and they were always information in ES logs but it's not the case this time.

Moreover the Kopf plugin indicates a green cluster and looking at _cluster/health or _cluster/health?level=indices indicated all indices are green, no red or yellow indication.

I had a problem with my /var partition which resulted in no ES logs... After solving this problem, I found the errors in the logs about ParseFailure. I tried to repair the Lucene indexes with the correct java command but without success ('No problems were detected with this index.'). I finally decided to close/delete those indices (they were old and not needed anymore).