Courier Fetch error

Hello everybody,

I've a problem with my ELK stack. Until now, all worked fine, but since 2 days, I've an error and I don't know how I can fix it.

I've a warning message at the top of my Kibana's pages : "Courier Fetch: 5 of 20 shards failed."
My Elasticsearch cluster is OK (Health : green)

I tried to clear the cache of Elasticsearch with this command : "curl -XPOST 'http://localhost:9200/_cache/clear' ". When I do this, everything is OK, but only for a few seconds.

I've a daily report and I saw that every night at 2 a.m, the collect of logs stops and doesn't restart

I don't know if my indexes are too big (~ 10-15 Gb / day) ?
If there's a problem during the index creation, because logstash sends logs permanently ? Shall I stop the logstash pipeline when Elasticsearch creates a new index and start the pipeline when the new index is ready ?

Thank you
Best regards

Check your ES logs, there may be something there.

Hello warkolm, Thank you for your answer.

In my ES logs, the only "interesting" lines I can see are :

g
[2015-07-30 08:56:37,105][WARN ][indices.breaker          ] [ES_NODE_1] [FIELDDATA] New used memory 633826551 [604.4mb] from field [netflow.l4_dst_port] would be larger than configured breaker: 633785548 [604.4mb],
 breaking

[2015-07-30 08:56:37,519][DEBUG][action.search.type       ] [ES_NODE_1] [logstash-2015.07.30][0], node[M_cOqAQ6SsalqjvYvBTanw], [R], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@588e
10fa] lastShard [true]
org.elasticsearch.ElasticsearchException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [netflow.l4_dst_port] would be larger than limit of [633785548/604.4mb]
        at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:80)
        at org.elasticsearch.search.aggregations.support.ValuesSource$MetaData.load(ValuesSource.java:88)
        at org.elasticsearch.search.aggregations.support.AggregationContext.numericField(AggregationContext.java:159)
        at org.elasticsearch.search.aggregations.support.AggregationContext.valuesSource(AggregationContext.java:137)
        at org.elasticsearch.search.aggregations.support.ValuesSourceAggregatorFactory.create(ValuesSourceAggregatorFactory.java:53)
        at org.elasticsearch.search.aggregations.AggregatorFactories.createAndRegisterContextAware(AggregatorFactories.java:53)
        at org.elasticsearch.search.aggregations.AggregatorFactories.createTopLevelAggregators(AggregatorFactories.java:157)
        at org.elasticsearch.search.aggregations.AggregationPhase.preProcess(AggregationPhase.java:79)
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:100)
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:272)
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:283)
        at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:231)
        at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:228)
        at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.util.concurrent.UncheckedExecutionException: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [netflow.l4_dst_port] would be larger
 than limit of [633785548/604.4mb]
        at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
        at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937)
        at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739)
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache.load(IndicesFieldDataCache.java:174)
        at org.elasticsearch.index.fielddata.plain.AbstractIndexFieldData.load(AbstractIndexFieldData.java:74)
        ... 16 more
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [netflow.l4_dst_port] would be larger than limit of [633785548/604.4mb]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.circuitBreak(ChildMemoryCircuitBreaker.java:97)
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:148)
        at org.elasticsearch.index.fielddata.RamAccountingTermsEnum.flush(RamAccountingTermsEnum.java:71)
        at org.elasticsearch.index.fielddata.RamAccountingTermsEnum.next(RamAccountingTermsEnum.java:85)
        at org.elasticsearch.index.fielddata.ordinals.OrdinalsBuilder$3.next(OrdinalsBuilder.java:472)
        at org.elasticsearch.index.fielddata.plain.PackedArrayIndexFieldData.loadDirect(PackedArrayIndexFieldData.java:109)
        at org.elasticsearch.index.fielddata.plain.PackedArrayIndexFieldData.loadDirect(PackedArrayIndexFieldData.java:49)
        at org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:187)

Regards

That's probably why.

Look into doc values and, if possible, providing more heap to your nodes.

I change the "ES_HEAP_SIZE" to the biggest possible value (10g)

But can you tell me what and how I can make something with "doc_values" please ?

EDIT : If this information can be useful : Yesterday, I tried to delete the last Index, so I lost all the data of the last day, but after that, the warning message disappeared.

Doc values - https://www.elastic.co/guide/en/elasticsearch/guide/current/doc-values.html

Also https://www.elastic.co/blog/support-in-the-wild-my-biggest-elasticsearch-problem-at-scale may be useful.

Thank you for your help, it seems to be OK !

Regards