Elasticsearch index went in Read-only mode

Hello, we are hitting this exception on index operation. Our Hosts are big machines and have huge SSD.

One particular index went in "Read only" mode and hence any further indexing operations failed. I would like to call out before this index went into read-only mode, there were 150k index operation performed within 30 minutes. Not sure if that caused the problem, but would like to bring up this stats

</>
Caused by: java.util.concurrent.ExecutionException: RemoteTransportException[[rqs-es-data-03302.node.ad3.us-ashburn-1.data1][10.194.46.130:9302][indices:data/write/index]]; nested: ClusterBlockException[blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];];
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:265)
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:238)
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:69)
at com.oracle.pic.rqs.consumer.indexer.SingleClusterIndexer.lambda$doIndex$0(SingleClusterIndexer.java:122)
at net.jodah.failsafe.SyncFailsafe.call(SyncFailsafe.java:145)
at net.jodah.failsafe.SyncFailsafe.get(SyncFailsafe.java:56)
at com.oracle.pic.rqs.consumer.utils.FailSafeHelper.callWithRetry(FailSafeHelper.java:61)
at com.oracle.pic.rqs.consumer.indexer.SingleClusterIndexer.attemptFailsafeCall(SingleClusterIndexer.java:229)
at com.oracle.pic.rqs.consumer.indexer.SingleClusterIndexer.doIndex(SingleClusterIndexer.java:118)
... 12 common frames omitted
Caused by: org.elasticsearch.transport.RemoteTransportException: [rqs-es-data-03302.node.ad3.us-ashburn-1.data1][10.194.46.130:9302][indices:data/write/index]
Caused by: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];
at org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:182)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.handleBlockExceptions(TransportReplicationAction.java:812)
at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:712)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:169)
at org.elasticsearch.action.support.replication.TransportReplicationAction.doExecute(TransportReplicationAction.java:97)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:81)
at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation.doRun(TransportBulkAction.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:462)
at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:209)
at org.elasticsearch.action.bulk.TransportBulkAction.doExecute(TransportBulkAction.java:86)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139)
at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:69)
at org.elasticsearch.action.bulk.TransportSingleItemBulkWriteAction.doExecute(TransportSingleItemBulkWriteAction.java:44)
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:167)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:139)
at org.elasticsearch.action.support.replication.TransportReplicationAction$OperationTransportHandler.messageReceived(TransportReplicationAction.java:251)
at org.elasticsearch.action.support.replication.TransportReplicationAction$OperationTransportHandler.messageReceived(TransportReplicationAction.java:243)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66)
at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1554)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:135)
at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1511)
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1380)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:64)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)

</>

You are most likely hitting the watermark flood stage. See here for details.

Elasticsearch enforces a read-only index block (index.blocks.read_only_allow_delete) on every index that has one or more shards allocated on the node that has at least one disk exceeding the flood stage. This is a last resort to prevent nodes from running out of disk space. The index block must be released manually once there is enough disk space available to allow indexing operations to continue.

Thanks for quick response. As I said, these are big machines with huge SSD (2.9 TB).

How does ES calculates it is running out-of-space?

Could also be something else, but first thing you could check is your "cluster.routing.allocation.disk.watermark*" settings in your configuration, if there are any. You should also find other warnings in the logs indicating that you hit the flood stage, like "flood stage disk watermark [...] exceeded". If there aren't any probably this is something else. Please check your logs for anything else out of the ordinary.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.