Large amounts of shards failing

DC64 · August 11, 2015, 7:12pm

Earlier today, for the first time, I have been receiving errors in Kibana saying that x/185 shards have failed. Depending on how large my date filter is (last 7 days or last hour) the number and total shards will change.

I am running ES 1.4 and Kibana 4.1

I can run _cat/recovery/ and get something like this: observables-2015.07.16 3 26 gateway done CIF-2 CPU2 n/a n/a 73 100.0% 54557248 100.0%

I can still visualize my data in my dashboard, but the errors are concerning. What should I do next?

Update: Some visualizations have less data showing up than before.

warkolm · August 11, 2015, 11:37pm

Something is up in ES.

Have a look at some of the other cat endpoints, like allocation, to see what is happening.

DC64 · August 12, 2015, 3:50pm

(Maybe this belongs in Elasticsearch section, sorry...)

Running curl -XGET http://localhost:9200/_cluster/health?pretty I get:
{ "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 201, "active_shards" : 201, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 201 }
When I run curl -XGET http://localhost:9200/_cat/shards I get
observables-2015.08.01 4 p STARTED 55421 49.4mb 127.0.1.1 Onyxx observables-2015.08.01 4 r UNASSIGNED and this list repeats until all shards are read thrugh. It looks like they are being assigned incorrectly, or not at all. The "4" changes from 0, 1,2, 3, 4, in any order, not sure why this is.

Running curl -XGET 'http://localhost:9200/_cat/allocation' I get:
201 16.3gb 22.2gb 38.6gb 42 CPU2 127.0.1.1 Onyxx 201 UNASSIGNED
Running _shards with | grep UNASSIGNED I see nearly all of my shards as unassigned, as expected.

I have hourly and daily pieces of data that are read, so my logs and shards are in big numbers.

warkolm · August 12, 2015, 9:31pm

Don't worry about the yellow, those shards won't be assigned as you only have one node. You can get rid of them with curl -XPUT localhost:9200/*/_settings -d '{ "index" : { "number_of_replicas" : 0 } }' to get your cluster to green.

What's happening in your ES logs?

(I'll also move this to the ES area)

DC64 · August 13, 2015, 3:01pm

Looking through my logs, elasticsearch.log.2015-08-10 looks fine with 6 lines of logs, but in elasticsearch.log.2015-08-11, I find in a few hundred lines of errors that java.lang.OutOfMemoryError: Java heap space, looks like a memory issue with java. This is running in a VM server with 16 gigs of memory, and 11 gigs of shards, though the hourly downloads of data could be making java upset.

I also get a common error: [FIELDDATA] Data too large, data for [timezone] would be larger than limit of [633785548/604.4mb]

What I don't understand is that my shards (over 200) from a month ago are also listed as unassigned. Ideally I should be able to look at my data from a week ago just fine, but that's not the case.

DC64 · August 17, 2015, 2:38pm

I found out the reality of how Elasticsearch really loves RAM. I had to move some shards into another folder so that some memory is free'd up. Until I get more than one server, moving shards around manually is the current option.

Topic		Replies	Views
Courier Fetch: 33 of 893 shards failed Kibana	6	6315	July 6, 2017
Why All shards failed Elasticsearch	2	5285	January 7, 2020
Unassigned Shards Elasticsearch	5	455	July 6, 2017
Unassigned shards, crashed cluster recovery Elasticsearch	9	13024	February 2, 2018
ElasticSearch Service: 2 of 15 shards failed Elasticsearch	3	433	November 12, 2019

Large amounts of shards failing

Related topics