IDs query tripping circuit breakers

Jeff_Bolle · May 31, 2017, 12:24am

Due to a previously discussed issue with aggregations, I have set my circuit breakers rather low, to prevent Elasticsearch from exiting. However, I am, among other issues, now seeing some strange behaviour on Queries from a Java application that uses an IdsQueryBuilder to build a simple query. It is looking on one index and type for one ID. The resulting document is a few k in size. Nothing special or large. However, here is the error I'm seeing in the logging:
Caught Exception: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15048704143/14gb], which is larger than the limit of [3749380096/3.4gb]] while executing. ESIndex: indexName/typeName ; query: {
"ids" : {
"type" : [
"typeName"
],
"values" : [
"idOfObject"
],
"boost" : 1.0
}
}
CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [15048704143/14gb], which is larger than the limit of [3749380096/3.4gb]]
at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:215)
at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128)
at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1465)
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1360)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:624)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:524)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:478)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:438)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)

dakrone · June 2, 2017, 2:13pm

Hmm.. that's strange, there should be something inside of the [] for what is incrementing the breaker.

Can you attach (or link) the output of /_nodes/stats for this cluster? I'd like to see what the breaker values are at for all breakers.

Also, what version of ES are you running?

Jeff_Bolle · June 2, 2017, 2:48pm

Version of ES: 5.4.0

Due to this breaker tripping on ID lookups I had to revert from my previous breaker settings (put in place due to https://github.com/elastic/elasticsearch/issues/24359)

The full node stats won't fit here, but I can show you what I changed and then got this breaker:

indices.breaker.total.limit: 50% from 70
indices.breaker.request.overhead: 550 from 1

dakrone · June 2, 2017, 3:20pm

This is definitely going to cause a problem, by increasing it to 550, you are essentially multiplying all the memory estimations by 550 times when the circuit breaker is doing its limit checks (so a 1024 byte request is trying to check that it can add 563200 bytes). I definitely recommend that you reset this back to 1.

Jeff_Bolle · June 2, 2017, 4:03pm

Understood. I've reverted those changes.
In this thread: https://github.com/elastic/elasticsearch/issues/15892 one user's solution to avoid the Thread exit that occurred due to the issues explained in the linked issues was to set the breakers very conservatively.

Any suggestions on how to set the breakers to avoid the aggregation memory allocation issues? This combination of issues is making it very difficult for me to sleep at night given that currently any of our users could bring down individual nodes or the whole cluster based on the query / visualization.

dakrone · June 2, 2017, 4:16pm

Not necessarily changing the breaker settings, but I believe you should be able to use the solution that was recommended here to alleviate this for now:

github.com/elastic/elasticsearch

Heap Explosion on even small cardinality queries in ES 5.3.1 / Kibana 5.3.1

opened 08:05AM - 27 Apr 17 UTC

closed 10:28AM - 10 Aug 17 UTC

jay-dihenkar

:Analytics/Aggregations :Core/Infra/Circuit Breakers

**Elasticsearch version**: 5.3.1 **Plugins installed**: [only defaults] **JVM version**: java version "1.8.0_112 **OS version**: centos 6.8 ( 2.6.32-642.6.2.el6.x86_64 ) **Description of the problem including expected versus actual behavior**: When loading unique counts on a field of values having 417docs with 417 total records, ES is going OOM. On introducing the circuit breaker I could capture this in Kibana : ``` Visualize: [request] Data too large, data for [<reused_arrays>] would be [119719468000/111.4gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [138553307200/129gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [111728258400/104gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [117386429600/109.3gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [120568676800/112.2gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [108346902400/100.9gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [106682848800/99.3gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [136041769600/126.6gb], which is larger than the limit of [8511422464/7.9gb] [request] Data too large, data for [<reused_arrays>] would be [140788947200/131.1gb], which is larger than the limit of [8511422464/7.9gb] ``` For 417 Docs is this expected????!!! * Field is not analyzed * Docs are flat, no nesting The query being hit for aggregation in debug log is : ``` [2017-04-27T02:36:36,042][DEBUG][o.e.a.s.TransportSearchAction] [i-vL8bl] [test-index][4], node[i-vL8blaT026-wCIWYo16g], [P], s[STARTED], a[id=bVc3KluLSg2mCrR3FRiENw]: Fail ed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[test-index], indicesOptions=IndicesOptions[id=39, ignore_unavailable=true, allow_no_indices=true, expand_ wildcards_open=true, expand_wildcards_closed=false, allow_alisases_to_multiple_indices=true, forbid_closed_indices=true], types=[], routing='null', preference='149327808 1183', requestCache=null, scroll=null, source={ "size" : 0, "query" : { "bool" : { "must" : [ { "query_string" : { "query" : "*", "fields" : [ ], "use_dis_max" : true, "tie_breaker" : 0.0, "default_operator" : "or", "auto_generate_phrase_queries" : false, "max_determinized_states" : 10000, "enable_position_increments" : true, "fuzziness" : "AUTO", "fuzzy_prefix_length" : 0, "fuzzy_max_expansions" : 50, "phrase_slop" : 0, "analyze_wildcard" : true, "escape" : false, "split_on_whitespace" : true, "boost" : 1.0 } }, { "range" : { "ES_timestamp" : { "from" : 1487183400000, "to" : 1487269800000, "include_lower" : true, "include_upper" : true, "format" : "epoch_millis", "boost" : 1.0 } } } ], "disable_coord" : false, "adjust_pure_negative" : true, "boost" : 1.0 } }, "_source" : { "includes" : [ ], "excludes" : [ ] }, "aggregations" : { "2" : { "terms" : { "field" : "<fieldname>", "size" : 5, "min_doc_count" : 1, "shard_min_doc_count" : 0, "show_term_doc_count_error" : false, "order" : [ { "1" : "desc" }, { "_term" : "asc" } ] }, "aggregations" : { "1" : { "cardinality" : { "field" : "<fieldname>" } } } } } }}] ``` Loading the same on precision value of ```{"precision_threshold": 1 }``` is working as expected but again fails for threshold of `10`. The mappings for the field are : ``` "<fieldname>": { "type": "string", "index": "not_analyzed", "fielddata": false }, ``` PS: I have already gone through the following issues before creating this one to ensure it's not a duplicate! Issues: https://github.com/elastic/elasticsearch/issues/21942 https://github.com/elastic/elasticsearch/pull/19215 And on discuss forum: https://discuss.elastic.co/t/mapping-change-on-upgrade-to-5-2-2/82622

Also, just so you know (I know it doesn't currently help you), there is an issue open to work on a fix for this, and a pull request for fixing this was just opened yesterday, it's currently targeted for 5.4.2, 5.5.0, and 6.x.

Jeff_Bolle · June 2, 2017, 5:11pm

The workaround works, but in my case it isn't a practical solution because my users create their own queries and visualizations (and would have to put the execution hint in each time). Most of their queries and aggregations work just fine. Until they do something that doesn't, and then it is too late. While some of this may be self-induced by the level of access our analysts have to our data, and their level of training, the best protection I can put in place is one that doesn't rely on every user changing every aggregation query they run.

Additionally, given that 5.4.1 isn't out, knowing that this is in 5.4.2 doesn't make me optimistic that this will be resolved soon.

system · June 30, 2017, 5:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Huge aggregation triggers CircuitBreaker in loop Elasticsearch	5	647	April 29, 2022
Circuit breaker always trips Elasticsearch	10	3924	December 27, 2017
7.4.0 Circuit breaking exceptions Elasticsearch	8	2564	December 17, 2019
CircuitBreakingException[[fielddata] Data too large, data for [_id] would be [3105987990/2.8gb], Elasticsearch	3	522	November 10, 2022
CircuitBreakingException: [parent] Data too large - troubleshoot Elasticsearch	3	315	October 5, 2020

IDs query tripping circuit breakers

Related topics