Taking searchable snapshots as part of an ILM policy for a spin, the indexes seem to be moving through the stages fine until it reaches the cold stage, where it takes the snapshot but then utterly fails to mount it.
Running a hot-warm-cold setup on Elastic 7.10.1 on ECK 1.3.1 trial license applied and using GCS as the snapshot backend.
The stack that I'm seeing (for every index that's attempted so far) is:
failing shard [failed shard, shard [restored-shrink-auditbeat-v7-000001][0], node[dQ8FF6kjTN6GXKAg9-36DQ], [P], recovery_source[snapshot recovery [_no_api_] from default:2021.01.23-shrink-auditbeat-v7-000001-pantheon-audit-0goifa_vqsynluhbnlktww/GVVNoA-SSUqyclwjIUMZLg], s[INITIALIZING], a[id=xluTACB4TfWtHUOO7O4lhQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2021-01-23T17:50:26.368Z], failed_attempts[2], failed_nodes[[dQ8FF6kjTN6GXKAg9-36DQ]], delayed=false, details[failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000001/IYliwBPhQaemofR4hVwMvA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]], allocation_status[fetching_shard_data]], expected_shard_size[14285895102], message [failed to create index], failure [IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000001/IYliwBPhQaemofR4hVwMvA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]], markAsStale [true]]
A cat of GET _cluster/allocation/explain?pretty
(reduced down)
{
"index" : "restored-shrink-auditbeat-v7-000004",
"shard" : 0,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2021-01-23T18:38:48.858Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000004/8IaTkA4oQQS9o04-aHoaPA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "dQ8FF6kjTN6GXKAg9-36DQ",
"node_name" : "oracle-es-cold-1",
"transport_address" : "10.50.7.3:9300",
"node_attributes" : {
"k8s_node_name" : "gke-oracle-standard-9729ac57-414q",
"xpack.installed" : "true",
"transform.node" : "false"
},
"node_decision" : "no",
"weight_ranking" : 1,
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-01-23T18:38:48.858Z], failed_attempts[5], failed_nodes[[dQ8FF6kjTN6GXKAg9-36DQ]], delayed=false, details[failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000004/8IaTkA4oQQS9o04-aHoaPA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]], allocation_status[deciders_no]]]"
},
{
"decider" : "restore_in_progress",
"decision" : "NO",
"explanation" : "shard has failed to be restored from the snapshot [default:2021.01.23-shrink-auditbeat-v7-000004-pantheon-audit-zuekhgcbsrs9ngym0zfjxw/vSp3OhG8TPG5NjFSge3fLw] because of [failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000004/8IaTkA4oQQS9o04-aHoaPA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]] - manually close or delete the index [restored-shrink-auditbeat-v7-000004] in order to retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
}
]
}
}
As mentioned in the response, I also attempted to manually reroute the shards onto nodes, but to no avail.
Not overly sure if I'm missing something from the docs about this or if this is a bug (I know this is still quite experimental)
TIA,
-Dan Miles