All shards failed for phase: [query]

I have an elastsicsearch instance on AWS (I have a similar one that is working just fine.

I have a lambda function that ships logs to elasticsearch. It stopped working after a while and now I can't see any new logs. I looked into logs and found this :

Caused by: java.lang.IllegalArgumentException: size must be positive, got 0
	at org.elasticsearch.search.aggregations.bucket.BucketUtils.suggestShardSideQueueSize(BucketUtils.java:40) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.aggregations.bucket.terms.TermsAggregatorFactory.doCreateInternal(TermsAggregatorFactory.java:100) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.createInternal(ValuesSourceAggregatorFactory.java:55) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.aggregations.AggregatorFactory.create(AggregatorFactory.java:225) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:102) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:61) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:104) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$18(IndicesService.java:1159) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$20(IndicesService.java:1229) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:150) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:133) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:398) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1235) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1158) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:257) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:273) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:300) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:297) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.transport.TransportService$6.doRun(TransportService.java:577) [elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:527) [elasticsearch-5.1.1.jar:5.1.1]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.1.1.jar:5.1.1]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]```


I did a curl on cluster health (yellow because one node) 
```{"cluster_name":"Pixel","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":11,"active_shards":11,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":11,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":50.0}```

A bit more information on the instance: 
it's on m4.4xlarge with 100 GB of EBS. It also uses 32 GB mem.  So far has 600k records in one index. 

What else should I look for?


I also cat the shards 

```pixel    1 p STARTED    130527 74.7mb 10.0.1.90 tqStC42
pixel    1 r UNASSIGNED                         
pixel    3 p STARTED    129687 74.4mb 10.0.1.90 tqStC42
pixel    3 r UNASSIGNED                         
pixel    2 p STARTED    130561   74mb 10.0.1.90 tqStC42
pixel    2 r UNASSIGNED                         
pixel    4 p STARTED    129870 74.6mb 10.0.1.90 tqStC42
pixel    4 r UNASSIGNED                         
pixel    0 p STARTED    129981 74.4mb 10.0.1.90 tqStC42
pixel    0 r UNASSIGNED    ```

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.