All shards failed even after tuning

Hi all,

I took a look into lots of open thread related to the error I'm getting.
In my company we are using ELK 7.5.1 for a development enviroment. Both ELK and Kibana stay on the same machine with 8GB of RAM.
The JVM HEAP SIZE Memory is 4GB and also the following parameters were edited:

  • "indices.breaker.fielddata.limit" : "40%",
    
  • "indices.breaker.request.limit" : "70%",
    
  • "indices.breaker.total.limit" : "80%",
    
  • "search.max_buckets" : "200000"
    

When we try to create a new index and write in it we get the following errors:

org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed

[DEBUG][o.e.a.s.TransportSearchAction] [##########] All shards failed for phase: [query]
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<reused_arrays>] would be [3423229520/3.1gb], which is larger than the limit of [3422027776/3.1gb], real usage: [3423229448/3.1gb], new bytes reserved: [72/72b], usages [request=1290518344/1.2gb, fielddata=67065/65.4kb, in_flight_requests=2566/2.5kb, accounting=18434993/17.5mb]

How can I solve this error?
Thank you very much in advance.

Giuseppe

What in the world is in that data/index you are creating, that will use 3GB of heap just to create/feed it? How many fields or data size? I assume you are doing bulk inserts from Logstash or something - I assume those index one at a time, but if not you can reduce bulk size, but I don't think it matters.

Thank you for your answer Steve,
your assumptions are correct, we are doing massive bulk inserts without any overlapping, I'll try reducing the bulk size and let you know.
Regards, Giuseppe.

There is a misunderstanding here, there's 3GB of heap used in total, not just for this one request. The extra usage from this request is just new bytes reserved: [72/72b]

Thanks for your answer David, so do you think reducing the bulk size would solve the problem?

Don't know, sorry. This node is definitely under-resourced for what you're doing with it, so either give it more resources or ask less of it. Exactly what that means will take some more investigation and experimentation.

Ok David, thank you very much.
Regards, Giuseppe

Oops; I need to read & understand more carefully.

Looks like the request is 1.2GB (request=1290518344/1.2gb); still darn large for indexing, I'd think - how does the bulk indexer use RAM, mostly per doc, or for the batch, i.e. what is effect of batch size on Heap use, as couldn't find much info on that?

Again no, that's the total size of all memory currently tracked by the request circuit breaker which means it's temporarily needed for something or other, usually searches/aggregations. The size of any incoming requests is accounted for in in_flight_requests instead.

Yes, the circuit breakers don't have the best names :confused:

Ugh, guess I should shut up now, as I was writing a reply before about be nice to know the actual size of the request/query/doc ingest that caused it, but then saw the request and thought thought that was it.

As in linux OOM, else it's hard to separate or know which are the 'big' ones - OOMs have a score to help with this, but I'm guessing with breakers it's hard to know if the request that trips the breaker is significant or not, or just the last byte that broke the camel/breaker's back. Thus hard to know on a busy system who the bad guys are; always hard on concurrent systems.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.