Issue using searchable snapshots in an ILM policy

Taking searchable snapshots as part of an ILM policy for a spin, the indexes seem to be moving through the stages fine until it reaches the cold stage, where it takes the snapshot but then utterly fails to mount it.

Running a hot-warm-cold setup on Elastic 7.10.1 on ECK 1.3.1 trial license applied and using GCS as the snapshot backend.

The stack that I'm seeing (for every index that's attempted so far) is:

failing shard [failed shard, shard [restored-shrink-auditbeat-v7-000001][0], node[dQ8FF6kjTN6GXKAg9-36DQ], [P], recovery_source[snapshot recovery [_no_api_] from default:2021.01.23-shrink-auditbeat-v7-000001-pantheon-audit-0goifa_vqsynluhbnlktww/GVVNoA-SSUqyclwjIUMZLg], s[INITIALIZING], a[id=xluTACB4TfWtHUOO7O4lhQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2021-01-23T17:50:26.368Z], failed_attempts[2], failed_nodes[[dQ8FF6kjTN6GXKAg9-36DQ]], delayed=false, details[failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000001/IYliwBPhQaemofR4hVwMvA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]], allocation_status[fetching_shard_data]], expected_shard_size[14285895102], message [failed to create index], failure [IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000001/IYliwBPhQaemofR4hVwMvA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]], markAsStale [true]]

A cat of GET _cluster/allocation/explain?pretty (reduced down)

{
  "index" : "restored-shrink-auditbeat-v7-000004",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2021-01-23T18:38:48.858Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000004/8IaTkA4oQQS9o04-aHoaPA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]",
    "last_allocation_status" : "no"
  },
  "can_allocate" : "no",
  "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions" : [
    {
      "node_id" : "dQ8FF6kjTN6GXKAg9-36DQ",
      "node_name" : "oracle-es-cold-1",
      "transport_address" : "10.50.7.3:9300",
      "node_attributes" : {
        "k8s_node_name" : "gke-oracle-standard-9729ac57-414q",
        "xpack.installed" : "true",
        "transform.node" : "false"
      },
      "node_decision" : "no",
      "weight_ranking" : 1,
      "deciders" : [
        {
          "decider" : "max_retry",
          "decision" : "NO",
          "explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2021-01-23T18:38:48.858Z], failed_attempts[5], failed_nodes[[dQ8FF6kjTN6GXKAg9-36DQ]], delayed=false, details[failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000004/8IaTkA4oQQS9o04-aHoaPA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]], allocation_status[deciders_no]]]"
        },
        {
          "decider" : "restore_in_progress",
          "decision" : "NO",
          "explanation" : "shard has failed to be restored from the snapshot [default:2021.01.23-shrink-auditbeat-v7-000004-pantheon-audit-zuekhgcbsrs9ngym0zfjxw/vSp3OhG8TPG5NjFSge3fLw] because of [failed shard on node [dQ8FF6kjTN6GXKAg9-36DQ]: failed to create index, failure IllegalStateException[multiple engine factories provided for [restored-shrink-auditbeat-v7-000004/8IaTkA4oQQS9o04-aHoaPA]: [org.elasticsearch.snapshots.SourceOnlySnapshotRepository$$Lambda$5291/0x0000000801869d30],[org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots$$Lambda$5292/0x00000008012a5a30]]] - manually close or delete the index [restored-shrink-auditbeat-v7-000004] in order to retry to restore the snapshot again or use the reroute API to force the allocation of an empty primary shard"
        }
      ]
    }
  }

As mentioned in the response, I also attempted to manually reroute the shards onto nodes, but to no avail.

Not overly sure if I'm missing something from the docs about this or if this is a bug (I know this is still quite experimental)

TIA,

-Dan Miles

Are you using a source-only repository? If so, that's not going to work with searchable snapshots -- the whole point of source-only repositories is that they drop all the data structures needed to support searching.

I opened an issue to improve the error reporting in this case:

1 Like

Ah excellent spot. Yeah it should have been apparent to me that they'd never play together for sure.

It's not awfully clear on the docs either that that won't end well. Hopefully this issue should serve for posterity.

Thanks David

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.