Multiple errors; "Failed to list store metadata for shard"


#1

Hi all,
I have a problem with my elasticsearch cluster with logging a large number of errors.
In the night, when import is ongoing I receive many errors:

[2019-01-15T03:28:54,372][WARN ][o.e.g.GatewayAllocator$InternalReplicaShardAllocator] [XXX-stg-util] [116-XXX-2019-01-14_23_59_02][0]: failed to list shard for shard_store on node [dU9OHwkgTIS8vR1JXLHEaQ]
org.elasticsearch.action.FailedNodeException: Failed node [dU9OHwkgTIS8vR1JXLHEaQ]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.onFailure(TransportNodesAction.java:239) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction.access$200(TransportNodesAction.java:153) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1.handleException(TransportNodesAction.java:211) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1067) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TcpTransport.lambda$handleException$16(TcpTransport.java:1467) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.common.util.concurrent.EsExecutors$1.execute(EsExecutors.java:110) [elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1465) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1457) [elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1401) [elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.5.0.jar:5.5.0]


Caused by: org.elasticsearch.transport.RemoteTransportException: [XXX-stg-app2][10.30.4.6:9300][internal:cluster/nodes/indices/shard/store[n]]
Caused by: org.elasticsearch.ElasticsearchException: Failed to list store metadata for shard [[116-XXX-2019-01-14_23_59_02][0]]
        at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:114) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.nodeOperation(TransportNodesListShardStoreMetaData.java:64) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.support.nodes.TransportNodesAction.nodeOperation(TransportNodesAction.java:140) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:262) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.action.support.nodes.TransportNodesAction$NodeTransportHandler.messageReceived(TransportNodesAction.java:258) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.transport.TcpTransport$RequestHandler.doRun(TcpTransport.java:1544) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-5.5.0.jar:5.5.0]

Caused by: java.io.FileNotFoundException: no segments* file found in store(mmapfs(/data/elastic/data/nodes/0/indices/I3ChJeMwRwevRF509mgYAw/0/index)): files: [recovery.AWhPVwK8S7hjY6TEdd7O._v.dii, recovery.AWhPVwK8S7hjY6TEdd7O._v.dim, recovery.AWhPVwK8S7hjY6TEdd7O._v.fdx, recovery.AWhPVwK8S7hjY6TEdd7O._v.fnm, recovery.AWhPVwK8S7hjY6TEdd7O._v.nvd, recovery.AWhPVwK8S7hjY6TEdd7O._v.nvm, recovery.AWhPVwK8S7hjY6TEdd7O._v.si, recovery.AWhPVwK8S7hjY6TEdd7O._v_Lucene50_0.doc, recovery.AWhPVwK8S7hjY6TEdd7O._v_Lucene50_0.pos, recovery.AWhPVwK8S7hjY6TEdd7O._v_Lucene50_0.tim, recovery.AWhPVwK8S7hjY6TEdd7O._v_Lucene50_0.tip, recovery.AWhPVwK8S7hjY6TEdd7O._v_Lucene54_0.dvd, recovery.AWhPVwK8S7hjY6TEdd7O._v_Lucene54_0.dvm, recovery.AWhPVwK8S7hjY6TEdd7O.segments_4, write.lock]
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:687) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:644) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.apache.lucene.index.SegmentInfos.readLatestCommit(SegmentInfos.java:450) ~[lucene-core-6.6.0.jar:6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46]
        at org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:129) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.index.store.Store.readSegmentsInfo(Store.java:199) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.index.store.Store.access$200(Store.java:127) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.index.store.Store$MetadataSnapshot.loadMetadata(Store.java:818) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:751) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.index.store.Store.getMetadata(Store.java:273) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.index.shard.IndexShard.snapshotStoreMetadata(IndexShard.java:874) ~[elasticsearch-5.5.0.jar:5.5.0]
        at org.elasticsearch.indices.store.TransportNodesListShardStoreMetaData.listStoreMetaData(TransportNodesListShardStoreMetaData.java:128) ~[elasticsearch-5.5.0.jar:5.5.0]

Do you have an idea what is causes of this problem?

ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.30.4.7 31 99 1 0.14 0.09 0.14 mdi * XXX-stg-util
10.30.4.6 49 99 1 0.01 0.05 0.05 mdi - XXX-stg-app2
10.30.4.4 69 99 0 0.00 0.03 0.05 mdi - XXX-stg-app1

XXX-cluster-stg-2019-01-10.log:77495
XXX-cluster-stg-2019-01-11.log:381894
XXX-cluster-stg-2019-01-12.log:95410
XXX-cluster-stg-2019-01-13.log:56012
XXX-cluster-stg-2019-01-14.log:256925

logs is cutted, because message was too long


(David Turner) #2

This suggests a node crashed while it was performing a recovery. It should stop being reported once all the shards have been allocated and your cluster health is green again.


(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.