Circuit Breaker Exception + Data Too Large for HTTP Request + HTTP/1.1 429 Too Many Requests

Hello Team,

We recently went Live with Elastic Search as our search engine and observed few instances where circuit breaker exception was thrown along with messages like Data too large and too many requests.

We have 5 ES nodes (using 7.10) in our cluster and each node has 8GB heap allocated. We have a Java Query Layer (typical Spring Boot Application) which receives the client requests and uses RestHighLevelClient (7.3) to send search requests to ES.

We never saw these errors in our lower enviornments testing including stress tests and only saw this coming when we were in production.

While it certainly points that this exception was triggered to avoid an OOM error on ES node but there are few things we want to check:

  • Does it indicate that we have less memory allocated to our heap? Our boxes are 16GB and we kept 50% allocated to ES. Any recommendations here?
  • We havent made any specific changes to the default settings in ES configs (yml). So whatever is running is out of the box.
  • Most of the requests from client applications are GET requests. There are couple of POST requests too.
  • We have bunch of microservices (again typical Spring Boot Applications) which perform indexing on ES nodes. We use BulkProcessor to index documents into ES.
  • Could this also be related to difference in the version we are using for RestHighLevelClient(7.3) whereas ES Nodes are 7.10 ?

Here is the exception we saw in our logs:

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [8585237264/7.9gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8585236744/7.9gb], new bytes reserved: [520/520b], usages [request=130720/127.6kb, fielddata=709538/692.9kb, in_flight_requests=15526/15.1kb, model_inference=0/0b, accounting=4214240/4mb]]
ElasticsearchStatusException[Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [8585237264/7.9gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8585236744/7.9gb], new bytes reserved: [520/520b], usages [request=130720/127.6kb, fielddata=709538/692.9kb, in_flight_requests=15526/15.1kb, model_inference=0/0b, accounting=4214240/4mb]]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1727)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1704)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1467)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1424)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1394)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:930)
at com.keysight.elasticsearch.service.ElasticSearchService.perform(ElasticSearchService.java:118)
at com.keysight.elasticsearch.service.SearchQueryService.search(SearchQueryService.java:172)
at com.keysight.elasticsearch.controller.SearchQueryController.searchQuery(SearchQueryController.java:61)
at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:897)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:634)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:791)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1417)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/product_hierarchy/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 429 Too Many Requests]

Any help is greatly appreciated.

Thanks,
Amit

Team, any help in this will be appreciated as we are having this issue intermittently in our environments

What is the full output of the cluster stats API? How many indices and shards are you typically searching? How many concurrent queries does the cluster experience?

Hi Christian, thanks for your response.

Here is the cluster output:

{"_nodes":{"total":5,"successful":5,"failed":0},"cluster_name":"kcom-production","cluster_uuid":"MbdtdJqbQeSsEI-n67PaQA","timestamp":1615277226374,"status":"green","indices":{"count":25,"shards":{"total":50,"primaries":25,"replication":1.0,"index":{"shards":{"min":2,"max":2,"avg":2.0},"primaries":{"min":1,"max":1,"avg":1.0},"replication":{"min":1.0,"max":1.0,"avg":1.0}}},"docs":{"count":11071220,"deleted":1126448},"store":{"size_in_bytes":14783826840,"reserved_in_bytes":0},"fielddata":{"memory_size_in_bytes":667160,"evictions":0},"query_cache":{"memory_size_in_bytes":48156246,"total_count":335946247,"hit_count":63218628,"miss_count":272727619,"cache_size":12181,"cache_count":1554431,"evictions":1542250},"completion":{"size_in_bytes":0},"segments":{"count":385,"memory_in_bytes":13349014,"terms_memory_in_bytes":10357320,"stored_fields_memory_in_bytes":372024,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":1727232,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":892438,"index_writer_memory_in_bytes":14255520,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":755888,"max_unsafe_auto_id_timestamp":1612078346328,"file_sizes":{}},"mappings":{"field_types":[{"name":"date","count":17,"index_count":17},{"name":"double","count":58,"index_count":2},{"name":"float","count":4,"index_count":2},{"name":"keyword","count":2840,"index_count":25},{"name":"long","count":1269,"index_count":5},{"name":"nested","count":7,"index_count":4},{"name":"object","count":679,"index_count":8},{"name":"text","count":3038,"index_count":25}]},"analysis":{"char_filter_types":[],"tokenizer_types":[],"filter_types":[{"name":"edge_ngram","count":24,"index_count":13},{"name":"ngram","count":11,"index_count":11}],"analyzer_types":[{"name":"custom","count":58,"index_count":14}],"built_in_char_filters":[{"name":"icu_normalizer","count":11,"index_count":11}],"built_in_tokenizers":[{"name":"keyword","count":1,"index_count":1},{"name":"kuromoji_tokenizer","count":11,"index_count":11},{"name":"nori_tokenizer","count":11,"index_count":11},{"name":"smartcn_tokenizer","count":11,"index_count":11},{"name":"standard","count":24,"index_count":13}],"built_in_filters":[{"name":"cjk_width","count":11,"index_count":11},{"name":"ja_stop","count":11,"index_count":11},{"name":"kuromoji_baseform","count":11,"index_count":11},{"name":"kuromoji_part_of_speech","count":11,"index_count":11},{"name":"kuromoji_stemmer","count":11,"index_count":11},{"name":"lowercase","count":58,"index_count":14},{"name":"nori_part_of_speech","count":11,"index_count":11},{"name":"nori_readingform","count":11,"index_count":11},{"name":"smartcn_stop","count":11,"index_count":11}],"built_in_analyzers":[{"name":"german","count":15,"index_count":5},{"name":"kuromoji","count":42,"index_count":9},{"name":"nori","count":42,"index_count":9},{"name":"portuguese","count":15,"index_count":5},{"name":"russian","count":15,"index_count":5},{"name":"smartcn","count":42,"index_count":9},{"name":"standard","count":42,"index_count":9}]}},"nodes":{"count":{"total":5,"coordinating_only":0,"data":5,"ingest":5,"master":5,"ml":5,"remote_cluster_client":5,"transform":5,"voting_only":0},"versions":["7.9.0"],"os":{"available_processors":40,"allocated_processors":40,"names":[{"name":"Linux","count":5}],"pretty_names":[{"pretty_name":"Amazon Linux 2","count":5}],"mem":{"total_in_bytes":81308901376,"free_in_bytes":7094743040,"used_in_bytes":74214158336,"free_percent":9,"used_percent":91}},"process":{"cpu":{"percent":15},"open_file_descriptors":{"min":447,"max":474,"avg":461}},"jvm":{"max_uptime_in_millis":3198760874,"versions":[{"version":"14.0.1","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"14.0.1+7","vm_vendor":"AdoptOpenJDK","bundled_jdk":true,"using_bundled_jdk":true,"count":5}],"mem":{"heap_used_in_bytes":12763383280,"heap_max_in_bytes":42949672960},"threads":337},"fs":{"total_in_bytes":1584664739840,"free_in_bytes":1558152237056,"available_in_bytes":1477537714176},"plugins":[{"name":"analysis-kuromoji","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.","classname":"org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin","extended_plugins":[],"has_native_controller":false},{"name":"analysis-icu","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.","classname":"org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin","extended_plugins":[],"has_native_controller":false},{"name":"analysis-smartcn","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module into elasticsearch.","classname":"org.elasticsearch.plugin.analysis.smartcn.AnalysisSmartChinesePlugin","extended_plugins":[],"has_native_controller":false},{"name":"analysis-nori","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"The Korean (nori) Analysis plugin integrates Lucene nori analysis module into elasticsearch.","classname":"org.elasticsearch.plugin.analysis.nori.AnalysisNoriPlugin","extended_plugins":[],"has_native_controller":false}],"network_types":{"transport_types":{"security4":5},"http_types":{"security4":5}},"discovery_types":{"zen":5},"packaging_types":[{"flavor":"default","type":"rpm","count":5}],"ingest":{"number_of_pipelines":2,"processor_stats":{"gsub":{"count":0,"failed":0,"current":0,"time_in_millis":0},"script":{"count":0,"failed":0,"current":0,"time_in_millis":0}}}}}

Most of the queries are to single index only. There are few which go to aliases where we have 4 indexed together.

Thanks,
Amit

Also we are noticing that this error comes intermittently. All the queries fired start taking much longer response time like 8-16 seconds and then we get this error.

Post this error we see that the response time comes back to miliseconds for most of the queries.

Another things we noted was CPU Utilization on ES Nodes is very high (90% +) and its specifically for few nodes so lets say out of 5 nodes we see high utilization in 3 nodes.

We arent getting any pointers for this. Dont know if bumping up the memory is what we should do. Memory utilization though is constantly 67-70%

The number of concurrent hits to the cluster are around 70 hits per second

We figured out the issue related to high CPU utilization on 2 specific nodes - we had default settings to have 1 primary shard and 1 replica shard for each index. We noticed that all our primary nodes were in a single node.

Also our heavy weight indexes shards (both primary and replica) were only in two nodes instead of being spread out evenly. Hence all the queries were being served from those two specific nodes.

We tried using reroute to balance the primary shards but it didnt help because elastic search kept balancing it again - somehow cluster thinks having those heavy indexes in 2 nodes is even - dont know why.

We then decided to have more replicas across all nodes so that all nodes are equally involved in search requests. So now every index has 4 replicas which means that all indexes shards are now in all nodes in the cluster.

After this change the load was distributed across all 5 nodes and CPU utilization also went down on those 2 specific nodes.

What I want to know - is it ok to have so many replicas across all nodes. We have 20 indexes and now we have 100 shards across 5 nodes. Still all primary shards are in a single node.

Please give us your thoughts.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.