Large amounts of shards failing


(Danny) #1

Earlier today, for the first time, I have been receiving errors in Kibana saying that x/185 shards have failed. Depending on how large my date filter is (last 7 days or last hour) the number and total shards will change.

I am running ES 1.4 and Kibana 4.1

I can run _cat/recovery/ and get something like this: observables-2015.07.16 3 26 gateway done CIF-2 CPU2 n/a n/a 73 100.0% 54557248 100.0%

I can still visualize my data in my dashboard, but the errors are concerning. What should I do next?

Update: Some visualizations have less data showing up than before.


(Mark Walkom) #2

Something is up in ES.

Have a look at some of the other cat endpoints, like allocation, to see what is happening.


(Danny) #3

(Maybe this belongs in Elasticsearch section, sorry...)

Running curl -XGET http://localhost:9200/_cluster/health?pretty I get:
{ "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 201, "active_shards" : 201, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 201 }
When I run curl -XGET http://localhost:9200/_cat/shards I get
observables-2015.08.01 4 p STARTED 55421 49.4mb 127.0.1.1 Onyxx observables-2015.08.01 4 r UNASSIGNED and this list repeats until all shards are read thrugh. It looks like they are being assigned incorrectly, or not at all. The "4" changes from 0, 1,2, 3, 4, in any order, not sure why this is.

Running curl -XGET 'http://localhost:9200/_cat/allocation' I get:
201 16.3gb 22.2gb 38.6gb 42 CPU2 127.0.1.1 Onyxx 201 UNASSIGNED
Running _shards with | grep UNASSIGNED I see nearly all of my shards as unassigned, as expected.

I have hourly and daily pieces of data that are read, so my logs and shards are in big numbers.


(Mark Walkom) #4

Don't worry about the yellow, those shards won't be assigned as you only have one node. You can get rid of them with curl -XPUT localhost:9200/*/_settings -d '{ "index" : { "number_of_replicas" : 0 } }' to get your cluster to green.

What's happening in your ES logs?

(I'll also move this to the ES area)


(Danny) #5

Looking through my logs, elasticsearch.log.2015-08-10 looks fine with 6 lines of logs, but in elasticsearch.log.2015-08-11, I find in a few hundred lines of errors that java.lang.OutOfMemoryError: Java heap space, looks like a memory issue with java. This is running in a VM server with 16 gigs of memory, and 11 gigs of shards, though the hourly downloads of data could be making java upset.

I also get a common error: [FIELDDATA] Data too large, data for [timezone] would be larger than limit of [633785548/604.4mb]

What I don't understand is that my shards (over 200) from a month ago are also listed as unassigned. Ideally I should be able to look at my data from a week ago just fine, but that's not the case.


(Danny) #6

I found out the reality of how Elasticsearch really loves RAM. I had to move some shards into another folder so that some memory is free'd up. Until I get more than one server, moving shards around manually is the current option.


(system) #7