I'm running into a strange situation with elasticsearch. I would like to estimate the Xmx needed for running elasticsearch data nodes while accounting for the heap needed during recovery.
All my documents are small (in KBs), and the writers which index the data into elasticsearch use small bulk requests (of a maximum 5000 documents).
However, when an index goes yellow for some time (due to one node going down and then coming back online), the recovery process sends large requests and starts tripping circuit breakers. The request sizes grow and the index never recovers to green.
I would like to budget for these large requests coming in from the recovery process which seem to be reserving several hundreds of MBs from the circuit breaker. Is there an upper limit to these? In the worst case would an entire shard be sent as a single bulk request across to the other node during replication?
failed to perform indices:data/write/bulk[s] on replica [xx], node[wy6bebBaQOC85iEi5vnJrA], [R], s[STARTED], a[id=ZxIAa9ZPRgiKxvT42TUCNg]" , "stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [elasticsearch-data-1][100.64.89.127:9300][indices:data/write/bulk[s][r]]", CircuitBreakingException: [parent] Data too large, data for [<transport_request>] .. new bytes reserved: [454588504/433.5mb].