Help with unassigned shards / CircuitBreakingException / Values less than -1 bytes are not supported

David, using 7.8.1 as a reference, I carefully went through all of the release notes and breaking changes pages from 7.9.0 to 7.10.1, focusing on known issues, breaking changes, upgrades and enhancements on write, allocation, coordination, store & aggregation.

ES 7.10.1 seems to be the best version to try. Since it's currently 5 days old :baby: and I noticed a couple of upgrades/switches on the JDK, could you share a link about Elasticsearch's test philosophy / strategies and or docs about the risks of performing this kind of upgrade?

I plan to start trying the upgrade tomorrow, but want to know what to expect and what areas I need to check for risks for my intensive bulk/api write and read aggregations use cases. Thanks!

I don't know of such a document. We only release versions once we're confident they work, and we build that confidence by continually running a pretty comprehensive suite of tests, reviewing every change, ongoing benchmarking, etc. The latest version is always the one we recommend for production use.

That said, we also recommend that you run your own tests in an isolated environment before upgrading. If you're risk-averse then you should definitely do that too.

1 Like

I may have found a breaking change in v.7.4 on testing the upgrade from 7.1.0 to 7.10.1:
Ignore_malformed fires mapper_parsing_exception for keyword type on 7.10.1 (it worked on 7.1.0)

Issue reported: error_trace parameter seems to be ignored on Bulk API items #66811

So far, the initial hours of performance tests are doing well after handling the ignore_malformed mapping issue. I haven't seen any unassigned shards, CircuitBreakingException or Values less than -1 bytes are not supported issues yet and the cluster continues running on green even on heavy load and having a node that disconnected and reconnected :green_circle: :crossed_fingers:!

I should post more results next week. Thanks for your amazing support and merry Christmas / happy hollidays!

Five more days running scale tests and the cluster remains on :green_circle: green state; it got yellow a couple times, but it was able to auto-recover.


I did notice the illegal_argument_exception with Values less than -1 bytes are not supported happening a lot, which prevented the cluster state to be captured from a long time interval on any of the nodes. The first time interval was between Dec 25 2020 14:35:38 and 23:07:40.454

Here is one stack trace that may have been of the http:://localhost:9200/_nodes/stats/fs,jvm,indices,breaker?error_trace=true API response:

{
  "error": {
    "root_cause": [{
      "type": "illegal_argument_exception",
      "reason": "Values less than -1 bytes are not supported: -4b",
      "stack_trace": "[Values less than -1 bytes are not supported: -4b]; nested: IllegalArgumentException[Values less than -1 bytes are not supported: -4b];\n\tat org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:644)\n\tat org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:572)\n\tat org.elasticsearch.rest.BytesRestResponse.build(BytesRestResponse.java:149)\n\tat org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:110)\n\tat org.elasticsearch.rest.BytesRestResponse.<init>(BytesRestResponse.java:93)\n\tat org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:58)\n\tat org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:49)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)\n\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.finishHim(TransportNodesAction.java:263)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onOperation(TransportNodesAction.java:248)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$000(TransportNodesAction.java:177)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:226)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:218)\n\tat org.elasticsearch.transport.TransportService$6.handleResponse(TransportService.java:634)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171)\n\tat org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253)\n\tat org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:245)\n\tat org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:133)\n\tat org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89)\n\tat org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700)\n\tat org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142)\n\tat org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117)\n\tat org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82)\n\tat org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\nCaused by: java.lang.IllegalArgumentException: Values less than -1 bytes are not supported: -4b\n\tat org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:79)\n\tat org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:74)\n\tat org.elasticsearch.index.cache.query.QueryCacheStats.getMemorySize(QueryCacheStats.java:73)\n\tat org.elasticsearch.index.cache.query.QueryCacheStats.toXContent(QueryCacheStats.java:130)\n\tat org.elasticsearch.action.admin.indices.stats.CommonStats.toXContent(CommonStats.java:510)\n\tat org.elasticsearch.indices.NodeIndicesStats.toXContent(NodeIndicesStats.java:202)\n\tat org.elasticsearch.action.admin.cluster.node.stats.NodeStats.toXContent(NodeStats.java:324)\n\tat org.elasticsearch.action.admin.cluster.node.stats.NodesStatsResponse.toXContent(NodesStatsResponse.java:61)\n\tat org.elasticsearch.rest.action.RestActions.nodesResponse(RestActions.java:184)\n\tat org.elasticsearch.rest.action.RestActions$NodesResponseRestListener.buildResponse(RestActions.java:237)\n\tat org.elasticsearch.rest.action.RestActions$NodesResponseRestListener.buildResponse(RestActions.java:228)\n\tat org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:38)\n\tat org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37)\n\tat org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47)\n\t... 41 more\n\tSuppressed: java.lang.IllegalStateException: Failed to close the XContentBuilder\n\t\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1017)\n\t\tat org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:37)\n\t\t... 43 more\n\tCaused by: java.io.IOException: Unclosed object or array found\n\t\tat org.elasticsearch.common.xcontent.json.JsonXContentGenerator.close(JsonXContentGenerator.java:456)\n\t\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1015)\n\t\t... 44 more\n"
    }],
    "type": "illegal_argument_exception",
    "reason": "Values less than -1 bytes are not supported: -4b",
   //...

(part 2 of the stack trace)

 "stack_trace": "java.lang.IllegalArgumentException: Values less than -1 bytes are not supported: -4b\n\tat org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:79)\n\tat org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:74)\n\tat org.elasticsearch.index.cache.query.QueryCacheStats.getMemorySize(QueryCacheStats.java:73)\n\tat org.elasticsearch.index.cache.query.QueryCacheStats.toXContent(QueryCacheStats.java:130)\n\tat org.elasticsearch.action.admin.indices.stats.CommonStats.toXContent(CommonStats.java:510)\n\tat org.elasticsearch.indices.NodeIndicesStats.toXContent(NodeIndicesStats.java:202)\n\tat org.elasticsearch.action.admin.cluster.node.stats.NodeStats.toXContent(NodeStats.java:324)\n\tat org.elasticsearch.action.admin.cluster.node.stats.NodesStatsResponse.toXContent(NodesStatsResponse.java:61)\n\tat org.elasticsearch.rest.action.RestActions.nodesResponse(RestActions.java:184)\n\tat org.elasticsearch.rest.action.RestActions$NodesResponseRestListener.buildResponse(RestActions.java:237)\n\tat org.elasticsearch.rest.action.RestActions$NodesResponseRestListener.buildResponse(RestActions.java:228)\n\tat org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:38)\n\tat org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37)\n\tat org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)\n\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.finishHim(TransportNodesAction.java:263)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onOperation(TransportNodesAction.java:248)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$000(TransportNodesAction.java:177)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:226)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:218)\n\tat org.elasticsearch.transport.TransportService$6.handleResponse(TransportService.java:634)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171)\n\tat org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253)\n\tat org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:245)\n\tat org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:133)\n\tat org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89)\n\tat org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700)\n\tat org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142)\n\tat org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117)\n\tat org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82)\n\tat org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\n\tSuppressed: java.lang.IllegalStateException: Failed to close the XContentBuilder\n\t\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1017)\n\t\tat org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:37)\n\t\t... 43 more\n\tCaused by: java.io.IOException: Unclosed object or array found\n\t\tat org.elasticsearch.common.xcontent.json.JsonXContentGenerator.close(JsonXContentGenerator.java:456)\n\t\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1015)\n\t\t... 44 more\n",

(part 3 of the stack trace)

"suppressed": [{
 "type": "illegal_state_exception",
 "reason": "Failed to close the XContentBuilder",
 "caused_by": {
   "type": "i_o_exception",
   "reason": "Unclosed object or array found",
   "stack_trace": "java.io.IOException: Unclosed object or array found\n\tat org.elasticsearch.common.xcontent.json.JsonXContentGenerator.close(JsonXContentGenerator.java:456)\n\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1015)\n\tat org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:37)\n\tat org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37)\n\tat org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)\n\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.finishHim(TransportNodesAction.java:263)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onOperation(TransportNodesAction.java:248)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$000(TransportNodesAction.java:177)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:226)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:218)\n\tat org.elasticsearch.transport.TransportService$6.handleResponse(TransportService.java:634)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171)\n\tat org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253)\n\tat org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:245)\n\tat org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:133)\n\tat org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89)\n\tat org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700)\n\tat org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142)\n\tat org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117)\n\tat org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82)\n\tat org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\n"
 },
 "stack_trace": "java.lang.IllegalStateException: Failed to close the XContentBuilder\n\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1017)\n\tat org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:37)\n\tat org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37)\n\tat org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89)\n\tat org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:83)\n\tat org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:58)\n\tat org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:73)\n\tat org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)\n\tat org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:224)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.finishHim(TransportNodesAction.java:263)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onOperation(TransportNodesAction.java:248)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$000(TransportNodesAction.java:177)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:226)\n\tat org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleResponse(TransportNodesAction.java:218)\n\tat org.elasticsearch.transport.TransportService$6.handleResponse(TransportService.java:634)\n\tat org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1171)\n\tat org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:253)\n\tat org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:245)\n\tat org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:133)\n\tat org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:89)\n\tat org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:700)\n\tat org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:142)\n\tat org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:117)\n\tat org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:82)\n\tat org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:271)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)\n\tat io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)\n\tat io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)\n\tat io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tat io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:615)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:578)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\nCaused by: java.io.IOException: Unclosed object or array found\n\tat org.elasticsearch.common.xcontent.json.JsonXContentGenerator.close(JsonXContentGenerator.java:456)\n\tat org.elasticsearch.common.xcontent.XContentBuilder.close(XContentBuilder.java:1015)\n\t... 44 more\n"
    }]
  },
  "status": 400
}

Thanks, that's helpful. The problem is in QueryCacheStats:

java.lang.IllegalArgumentException: Values less than -1 bytes are not supported: -4b
        at org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:79)
        at org.elasticsearch.common.unit.ByteSizeValue.<init>(ByteSizeValue.java:74)
        at org.elasticsearch.index.cache.query.QueryCacheStats.getMemorySize(QueryCacheStats.java:73)
        at org.elasticsearch.index.cache.query.QueryCacheStats.toXContent(QueryCacheStats.java:130)
        at org.elasticsearch.action.admin.indices.stats.CommonStats.toXContent(CommonStats.java:510)
        at org.elasticsearch.indices.NodeIndicesStats.toXContent(NodeIndicesStats.java:202)
        at org.elasticsearch.action.admin.cluster.node.stats.NodeStats.toXContent(NodeStats.java:324)
        at org.elasticsearch.action.admin.cluster.node.stats.NodesStatsResponse.toXContent(NodesStatsResponse.java:61)
        at org.elasticsearch.rest.action.RestActions.nodesResponse(RestActions.java:184)
        at org.elasticsearch.rest.action.RestActions$NodesResponseRestListener.buildResponse(RestActions.java:237)
        at org.elasticsearch.rest.action.RestActions$NodesResponseRestListener.buildResponse(RestActions.java:228)
        at org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:38)
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:37)
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:47)
        at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:89)

This is a known issue, see https://github.com/elastic/elasticsearch/issues/55434 - it looks like a bug but we don't know the cause at the moment.

Is there anything I can do to help, like collecting more detailed information?
It seems reproducible on my environment. I'm doing intensive time series bulk index and occasional delete of expired indices and create new ones once a threshold is reached. Also every 10 min or so I do query for a couple of health stats. And when the issue happened, there was no GUI query, so it was most likely triggered by the stats queries.

Here some CPU/RAM stats a few seconds after the error first occurred:

CPU/RAM Check
  us:100.8 ni:0 sy:11.2 id:673.3 wa:6.2 hi:0 si:0.6 st:0.8 
  elasticsearch:0.3%/20833M  
  node-es-app:0.7%/280M
  node-gui-app:0%/136M
  node-kafka-consumer-app:0.3%/222M
GC Log

The first error timestamp is Dec 25 2020 14:36:29.945 which is 2020-12-25T22:36:29.945+0000

[2020-12-25T22:35:27.802+0000][17629][gc,task     ] GC(5572) Using 8 workers of 8 for evacuation
[2020-12-25T22:35:27.802+0000][17629][gc,age      ] GC(5572) Desired survivor size 645922816 bytes, new threshold 15 (max threshold 15)
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) Age table with threshold 15 (max threshold 15)
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   1:   96866720 bytes,   96866720 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   2:     875688 bytes,   97742408 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   3:    1148584 bytes,   98890992 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   4:    1228376 bytes,  100119368 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   5:     671064 bytes,  100790432 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   6:    1317520 bytes,  102107952 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   7:    2431152 bytes,  104539104 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   8:    1062832 bytes,  105601936 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age   9:    1093856 bytes,  106695792 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age  10:     931104 bytes,  107626896 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age  11:     960808 bytes,  108587704 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age  12:    1379720 bytes,  109967424 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age  13:    1897664 bytes,  111865088 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age  14:    1075400 bytes,  112940488 total 
[2020-12-25T22:35:27.845+0000][17629][gc,age      ] GC(5572) - age  15:    1100864 bytes,  114041352 total 
[2020-12-25T22:35:27.845+0000][17629][gc,phases   ] GC(5572)   Pre Evacuate Collection Set: 0.5ms 
[2020-12-25T22:35:27.845+0000][17629][gc,phases   ] GC(5572)   Merge Heap Roots: 0.4ms 
[2020-12-25T22:35:27.845+0000][17629][gc,phases   ] GC(5572)   Evacuate Collection Set: 34.4ms
[2020-12-25T22:35:27.845+0000][17629][gc,phases   ] GC(5572)   Post Evacuate Collection Set: 7.1ms 
[2020-12-25T22:35:27.845+0000][17629][gc,phases   ] GC(5572)   Other: 0.5ms 
[2020-12-25T22:35:27.845+0000][17629][gc,heap     ] GC(5572) Eden regions: 1217->0(1213)
[2020-12-25T22:35:27.845+0000][17629][gc,heap     ] GC(5572) Survivor regions: 11->15(154)
[2020-12-25T22:35:27.845+0000][17629][gc,heap     ] GC(5572) Old regions: 120->121
[2020-12-25T22:35:27.845+0000][17629][gc,heap     ] GC(5572) Archive regions: 2->2
[2020-12-25T22:35:27.845+0000][17629][gc,heap     ] GC(5572) Humongous regions: 8->8
[2020-12-25T22:35:27.845+0000][17629][gc,metaspace] GC(5572) Metaspace: 81817K(84304K)->81817K(84304K) NonClass: 72150K(73936K)->72150K(73936K) Class: 9667K(10368K)->9667K(10368K)
[2020-12-25T22:35:27.845+0000][17629][gc          ] GC(5572) Pause Young (Normal) (G1 Evacuation Pause) 10855M->1147M(16384M) 43.178ms
[2020-12-25T22:35:27.845+0000][17629][gc,cpu      ] GC(5572) User=0.09s Sys=0.00s Real=0.05s
[2020-12-25T22:35:27.845+0000][17629][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 187299620267 ns, Reaching safepoint: 368638 ns, At safepoint: 43323743 ns, Total: 43692381 ns
[2020-12-25T22:36:29.858+0000][17629][safepoint   ] Safepoint "Cleanup", Time since last: 62012215944 ns, Reaching safepoint: 181759 ns, At safepoint: 15241 ns, Total: 197000 ns
[2020-12-25T22:36:30.858+0000][17629][safepoint   ] Safepoint "Cleanup", Time since last: 1000309566 ns, Reaching safepoint: 179210 ns, At safepoint: 10574 ns, Total: 189784 ns
[2020-12-25T22:36:53.862+0000][17629][safepoint   ] Safepoint "Cleanup", Time since last: 23003486317 ns, Reaching safepoint: 199146 ns, At safepoint: 9706 ns, Total: 208852 ns
[2020-12-25T22:37:04.864+0000][17629][safepoint   ] Safepoint "Cleanup", Time since last: 11001632988 ns, Reaching safepoint: 329561 ns, At safepoint: 22201 ns, Total: 351762 ns
[2020-12-25T22:38:28.878+0000][17629][safepoint   ] Safepoint "Cleanup", Time since last: 84013263412 ns, Reaching safepoint: 352420 ns, At safepoint: 15756 ns, Total: 368176 ns

I don't know, sorry, it's not an area of the code with which I'm familiar. You could try asking on the Github issue. I don't think it's likely that it's caused by querying stats.

Yeah, I meant to say the query which was checking the cache memory size most likely was the /_nodes/stats. I also meant to emphasize that what was using the heap was the bulk indexing (and perhaps internally moving shards after delete or create), not other memory intensive features like GUI aggregations.

Cool! I replicated my questions and comments there.

So far the most critical issues have been fixed with the upgrade. I will keep monitoring the upgraded cluster and add any relevant results here. Thanks again!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.