One node now failing after successful upgrade 6.8->7.4 day before

Did an upgrade yesterday from 6.8 to 7.1.1 to 7.4
Today I find one node out of 27 data nodes failing like below and wonder who and why are sending so large message(s):

[2019-10-19T13:53:43,380][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [d1r2n1] failed to execute on node [FDZezqUCRYqLNHfGV6eECw]
org.elasticsearch.transport.RemoteTransportException: [es-mst3][<redacted>:9300][cluster:monitor/nodes/info[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [4061260842/3.7gb], which is larger than the limit of [4047097036/3.7gb], real usage: [4061245568/3.7gb], new bytes reserved: [15274/14.9kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=15274/14.9kb, accounting=0/0b]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.4.0.jar:7.4.0]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.4.0.jar:7.4.0]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:170) [elasticsearch-7.4.0.jar:7.4.0]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:118) [elasticsearch-7.4.0.jar:7.4.0]

Edit: seems it might be backlogging logstash instances. Backlogging due to index template issues after index rollover last night, just fixed the templates and bouncing the logstash instances...

fixing all template issues and bouncing all backlogged logstash instances and a full elastic cluster restart seems to have settled the issue for now :sweat_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.