Hi All,
A while ago I reported this issue, but back then I was running a single node cluser and though that may have been the issue. Yesterday I migrated to a triple node cluster configuration:
3 elasticsearch nodes (master and data nodes)
1 logstash
1 kibana
5 filebeats.
Everything is running version 7.0.0
Yesterdat after migrating the cluster, everything went well.
This morning I was faced with a red cluster, the issue is an unassigned shard for the newest index, after hitting the allocation exaplain endpoint I got this:
{ "index": "logstash-000051", "shard": 0, "primary": true, "current_state": "unassigned", "unassigned_info": { "reason": "ALLOCATION_FAILED", "at": "2019-12-06T12:34:31.577Z", "failed_allocation_attempts": 5, "details": "failed shard on node [25AFdrnqTG6lRsDR4CxZpQ]: failed recovery, failure RecoveryFailedException[[logstash-000051][0]: Recovery failed on {elasticsearch2}{25AFdrnqTG6lRsDR4CxZpQ}{mPx8umy9QIWE7hR8vdNu3w}{44.128.0.11}{44.128.0.11:9301}{ml.machine_memory=6442450944, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[/usr/share/elasticsearch/data/nodes/0/indices/50zRNBhxT3qdJpUdS4B47Q/0/index/_1a4.fdt]; ", "last_allocation_status": "no" }, "can_allocate": "no", "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy", "node_allocation_decisions": [ { "node_id": "25AFdrnqTG6lRsDR4CxZpQ", "node_name": "elasticsearch2", "transport_address": "44.128.0.11:9301", "node_attributes": { "ml.machine_memory": "6442450944", "ml.max_open_jobs": "20", "xpack.installed": "true" }, "node_decision": "no", "store": { "in_sync": true, "allocation_id": "fhFsHVG5RsW_kMxAYfYAcg" }, "deciders": [ { "decider": "max_retry", "decision": "NO", "explanation": "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2019-12-06T12:34:31.577Z], failed_attempts[5], delayed=false, details[failed shard on node [25AFdrnqTG6lRsDR4CxZpQ]: failed recovery, failure RecoveryFailedException[[logstash-000051][0]: Recovery failed on {elasticsearch2}{25AFdrnqTG6lRsDR4CxZpQ}{mPx8umy9QIWE7hR8vdNu3w}{44.128.0.11}{44.128.0.11:9301}{ml.machine_memory=6442450944, xpack.installed=true, ml.max_open_jobs=20}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: NoSuchFileException[/usr/share/elasticsearch/data/nodes/0/indices/50zRNBhxT3qdJpUdS4B47Q/0/index/_1a4.fdt]; ], allocation_status[deciders_no]]]" }, { "decider": "replica_after_primary_active", "decision": "YES", "explanation": "shard is primary and can be allocated" }, { "decider": "enable", "decision": "YES", "explanation": "all allocations are allowed" }, { "decider": "node_version", "decision": "YES", "explanation": "the primary shard is new or already existed on the node" }, { "decider": "snapshot_in_progress", "decision": "YES", "explanation": "no snapshots are currently running" }, { "decider": "restore_in_progress", "decision": "YES", "explanation": "ignored as shard is not being recovered from a snapshot" }, { "decider": "filter", "decision": "YES", "explanation": "node passes include/exclude/require filters" }, { "decider": "same_shard", "decision": "YES", "explanation": "the shard does not exist on the same node" }, { "decider": "disk_threshold", "decision": "YES", "explanation": "enough disk for shard on node, free: [4.6tb], shard size: [0b], free after allocating shard: [4.6tb]" }, { "decider": "throttling", "decision": "YES", "explanation": "below primary recovery limit of [4]" }, { "decider": "shards_limit", "decision": "YES", "explanation": "total shard limits are disabled: [index: -1, cluster: -1] <= 0" }, { "decider": "awareness", "decision": "YES", "explanation": "allocation awareness is not enabled, set cluster setting [cluster.routing.allocation.awareness.attributes] to enable it" } ] }, { "node_id": "f4qB5RJ-QN-46B44ZRLcrQ", "node_name": "elasticsearch3", "transport_address": "44.128.0.11:9302", "node_attributes": { "ml.machine_memory": "6442450944", "ml.max_open_jobs": "20", "xpack.installed": "true" }, "node_decision": "no", "store": { "found": false } }, { "node_id": "gjegRSM1Rbi22HYJOFYINw", "node_name": "elasticsearch1", "transport_address": "44.128.0.11:9300", "node_attributes": { "ml.machine_memory": "6442450944", "ml.max_open_jobs": "20", "xpack.installed": "true" }, "node_decision": "no", "store": { "found": false } } ] }
The original node was named elasticsearch1, I added two new nodes elasticsearch2 and elasticsearch3. They all share the same file system but each has its own data folder.
Any ideas?