Circuit Breaker Exception + Data Too Large for HTTP Request + HTTP/1.1 429 Too Many Requests

israni.amit · March 6, 2021, 10:04pm

Hello Team,

We recently went Live with Elastic Search as our search engine and observed few instances where circuit breaker exception was thrown along with messages like Data too large and too many requests.

We have 5 ES nodes (using 7.10) in our cluster and each node has 8GB heap allocated. We have a Java Query Layer (typical Spring Boot Application) which receives the client requests and uses RestHighLevelClient (7.3) to send search requests to ES.

We never saw these errors in our lower enviornments testing including stress tests and only saw this coming when we were in production.

While it certainly points that this exception was triggered to avoid an OOM error on ES node but there are few things we want to check:

Does it indicate that we have less memory allocated to our heap? Our boxes are 16GB and we kept 50% allocated to ES. Any recommendations here?
We havent made any specific changes to the default settings in ES configs (yml). So whatever is running is out of the box.
Most of the requests from client applications are GET requests. There are couple of POST requests too.
We have bunch of microservices (again typical Spring Boot Applications) which perform indexing on ES nodes. We use BulkProcessor to index documents into ES.
Could this also be related to difference in the version we are using for RestHighLevelClient(7.3) whereas ES Nodes are 7.10 ?

Here is the exception we saw in our logs:

org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [8585237264/7.9gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8585236744/7.9gb], new bytes reserved: [520/520b], usages [request=130720/127.6kb, fielddata=709538/692.9kb, in_flight_requests=15526/15.1kb, model_inference=0/0b, accounting=4214240/4mb]]
ElasticsearchStatusException[Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [8585237264/7.9gb], which is larger than the limit of [8160437862/7.5gb], real usage: [8585236744/7.9gb], new bytes reserved: [520/520b], usages [request=130720/127.6kb, fielddata=709538/692.9kb, in_flight_requests=15526/15.1kb, model_inference=0/0b, accounting=4214240/4mb]]]
at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1727)
at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1704)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1467)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1424)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1394)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:930)
at com.keysight.elasticsearch.service.ElasticSearchService.perform(ElasticSearchService.java:118)
at com.keysight.elasticsearch.service.SearchQueryService.search(SearchQueryService.java:172)
at com.keysight.elasticsearch.controller.SearchQueryController.searchQuery(SearchQueryController.java:61)
at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:800)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:897)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:634)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:490)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:408)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:791)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1417)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/product_hierarchy/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 429 Too Many Requests]

Any help is greatly appreciated.

Thanks,
Amit

israni.amit · March 9, 2021, 5:38am

Team, any help in this will be appreciated as we are having this issue intermittently in our environments

Christian_Dahlqvist · March 9, 2021, 6:12am

What is the full output of the cluster stats API? How many indices and shards are you typically searching? How many concurrent queries does the cluster experience?

israni.amit · March 9, 2021, 8:13am

Hi Christian, thanks for your response.

Here is the cluster output:

{"_nodes":{"total":5,"successful":5,"failed":0},"cluster_name":"kcom-production","cluster_uuid":"MbdtdJqbQeSsEI-n67PaQA","timestamp":1615277226374,"status":"green","indices":{"count":25,"shards":{"total":50,"primaries":25,"replication":1.0,"index":{"shards":{"min":2,"max":2,"avg":2.0},"primaries":{"min":1,"max":1,"avg":1.0},"replication":{"min":1.0,"max":1.0,"avg":1.0}}},"docs":{"count":11071220,"deleted":1126448},"store":{"size_in_bytes":14783826840,"reserved_in_bytes":0},"fielddata":{"memory_size_in_bytes":667160,"evictions":0},"query_cache":{"memory_size_in_bytes":48156246,"total_count":335946247,"hit_count":63218628,"miss_count":272727619,"cache_size":12181,"cache_count":1554431,"evictions":1542250},"completion":{"size_in_bytes":0},"segments":{"count":385,"memory_in_bytes":13349014,"terms_memory_in_bytes":10357320,"stored_fields_memory_in_bytes":372024,"term_vectors_memory_in_bytes":0,"norms_memory_in_bytes":1727232,"points_memory_in_bytes":0,"doc_values_memory_in_bytes":892438,"index_writer_memory_in_bytes":14255520,"version_map_memory_in_bytes":0,"fixed_bit_set_memory_in_bytes":755888,"max_unsafe_auto_id_timestamp":1612078346328,"file_sizes":{}},"mappings":{"field_types":[{"name":"date","count":17,"index_count":17},{"name":"double","count":58,"index_count":2},{"name":"float","count":4,"index_count":2},{"name":"keyword","count":2840,"index_count":25},{"name":"long","count":1269,"index_count":5},{"name":"nested","count":7,"index_count":4},{"name":"object","count":679,"index_count":8},{"name":"text","count":3038,"index_count":25}]},"analysis":{"char_filter_types":[],"tokenizer_types":[],"filter_types":[{"name":"edge_ngram","count":24,"index_count":13},{"name":"ngram","count":11,"index_count":11}],"analyzer_types":[{"name":"custom","count":58,"index_count":14}],"built_in_char_filters":[{"name":"icu_normalizer","count":11,"index_count":11}],"built_in_tokenizers":[{"name":"keyword","count":1,"index_count":1},{"name":"kuromoji_tokenizer","count":11,"index_count":11},{"name":"nori_tokenizer","count":11,"index_count":11},{"name":"smartcn_tokenizer","count":11,"index_count":11},{"name":"standard","count":24,"index_count":13}],"built_in_filters":[{"name":"cjk_width","count":11,"index_count":11},{"name":"ja_stop","count":11,"index_count":11},{"name":"kuromoji_baseform","count":11,"index_count":11},{"name":"kuromoji_part_of_speech","count":11,"index_count":11},{"name":"kuromoji_stemmer","count":11,"index_count":11},{"name":"lowercase","count":58,"index_count":14},{"name":"nori_part_of_speech","count":11,"index_count":11},{"name":"nori_readingform","count":11,"index_count":11},{"name":"smartcn_stop","count":11,"index_count":11}],"built_in_analyzers":[{"name":"german","count":15,"index_count":5},{"name":"kuromoji","count":42,"index_count":9},{"name":"nori","count":42,"index_count":9},{"name":"portuguese","count":15,"index_count":5},{"name":"russian","count":15,"index_count":5},{"name":"smartcn","count":42,"index_count":9},{"name":"standard","count":42,"index_count":9}]}},"nodes":{"count":{"total":5,"coordinating_only":0,"data":5,"ingest":5,"master":5,"ml":5,"remote_cluster_client":5,"transform":5,"voting_only":0},"versions":["7.9.0"],"os":{"available_processors":40,"allocated_processors":40,"names":[{"name":"Linux","count":5}],"pretty_names":[{"pretty_name":"Amazon Linux 2","count":5}],"mem":{"total_in_bytes":81308901376,"free_in_bytes":7094743040,"used_in_bytes":74214158336,"free_percent":9,"used_percent":91}},"process":{"cpu":{"percent":15},"open_file_descriptors":{"min":447,"max":474,"avg":461}},"jvm":{"max_uptime_in_millis":3198760874,"versions":[{"version":"14.0.1","vm_name":"OpenJDK 64-Bit Server VM","vm_version":"14.0.1+7","vm_vendor":"AdoptOpenJDK","bundled_jdk":true,"using_bundled_jdk":true,"count":5}],"mem":{"heap_used_in_bytes":12763383280,"heap_max_in_bytes":42949672960},"threads":337},"fs":{"total_in_bytes":1584664739840,"free_in_bytes":1558152237056,"available_in_bytes":1477537714176},"plugins":[{"name":"analysis-kuromoji","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"The Japanese (kuromoji) Analysis plugin integrates Lucene kuromoji analysis module into elasticsearch.","classname":"org.elasticsearch.plugin.analysis.kuromoji.AnalysisKuromojiPlugin","extended_plugins":[],"has_native_controller":false},{"name":"analysis-icu","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"The ICU Analysis plugin integrates the Lucene ICU module into Elasticsearch, adding ICU-related analysis components.","classname":"org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin","extended_plugins":[],"has_native_controller":false},{"name":"analysis-smartcn","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"Smart Chinese Analysis plugin integrates Lucene Smart Chinese analysis module into elasticsearch.","classname":"org.elasticsearch.plugin.analysis.smartcn.AnalysisSmartChinesePlugin","extended_plugins":[],"has_native_controller":false},{"name":"analysis-nori","version":"7.9.0","elasticsearch_version":"7.9.0","java_version":"1.8","description":"The Korean (nori) Analysis plugin integrates Lucene nori analysis module into elasticsearch.","classname":"org.elasticsearch.plugin.analysis.nori.AnalysisNoriPlugin","extended_plugins":[],"has_native_controller":false}],"network_types":{"transport_types":{"security4":5},"http_types":{"security4":5}},"discovery_types":{"zen":5},"packaging_types":[{"flavor":"default","type":"rpm","count":5}],"ingest":{"number_of_pipelines":2,"processor_stats":{"gsub":{"count":0,"failed":0,"current":0,"time_in_millis":0},"script":{"count":0,"failed":0,"current":0,"time_in_millis":0}}}}}

Most of the queries are to single index only. There are few which go to aliases where we have 4 indexed together.

Thanks,
Amit

israni.amit · March 10, 2021, 2:02am

Also we are noticing that this error comes intermittently. All the queries fired start taking much longer response time like 8-16 seconds and then we get this error.

Post this error we see that the response time comes back to miliseconds for most of the queries.

Another things we noted was CPU Utilization on ES Nodes is very high (90% +) and its specifically for few nodes so lets say out of 5 nodes we see high utilization in 3 nodes.

We arent getting any pointers for this. Dont know if bumping up the memory is what we should do. Memory utilization though is constantly 67-70%

israni.amit · March 10, 2021, 3:22pm

The number of concurrent hits to the cluster are around 70 hits per second

israni.amit · March 11, 2021, 6:49pm

We figured out the issue related to high CPU utilization on 2 specific nodes - we had default settings to have 1 primary shard and 1 replica shard for each index. We noticed that all our primary nodes were in a single node.

Also our heavy weight indexes shards (both primary and replica) were only in two nodes instead of being spread out evenly. Hence all the queries were being served from those two specific nodes.

We tried using reroute to balance the primary shards but it didnt help because elastic search kept balancing it again - somehow cluster thinks having those heavy indexes in 2 nodes is even - dont know why.

We then decided to have more replicas across all nodes so that all nodes are equally involved in search requests. So now every index has 4 replicas which means that all indexes shards are now in all nodes in the cluster.

After this change the load was distributed across all 5 nodes and CPU utilization also went down on those 2 specific nodes.

What I want to know - is it ok to have so many replicas across all nodes. We have 20 indexes and now we have 100 shards across 5 nodes. Still all primary shards are in a single node.

Please give us your thoughts.

system · April 8, 2021, 6:50pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Circuit Breaker Exception Elasticsearch	5	189	December 7, 2022
Cause and how to avoid "Data too large <http_request>" exception Elasticsearch	5	9629	December 11, 2018
Elastic data large exception (Data too large, data for [http_request]) Elasticsearch	3	510	October 31, 2023
Circuit breaker problem Elasticsearch	4	1013	November 30, 2020
Circuit break exception Elasticsearch	6	305	August 2, 2022

Circuit Breaker Exception + Data Too Large for HTTP Request + HTTP/1.1 429 Too Many Requests

Related topics