Dec 18th, 2020: [EN] Set up searchable snapshots in ECK

Searchable snapshots, recently released as BETA in Elasticsearch 7.10, let you reduce your operating costs by using snapshots for resiliency rather than maintaining replica shards within a cluster.

In this blog, we’ll demonstrate how to create a hot-cold topology using Elastic Cloud on Kubernetes (ECK) where for the cold tier, we will mount a snapshot using the new searchable snapshots API. We will also demonstrate how data is recovered using searchable snapshots upon a failure in the cold tier.
The demonstration is carried out on Google Kubernetes Engine (GKE) and can easily be adjusted to other Kubernetes environments.

Prerequisites:

  • GKE cluster with ECK 1.3.0 installed
  • GKS repository with a snapshot containing an index in the cold phase, we will mount using the searchable snapshots API to the cold tier

You can control index lifecycle phases using ILM or Index-level data tier allocation filtering.

1. Create an Elasticsearch cluster with a hot-cold topology

The following Kubernetes manifest describes an Elasticsearch cluster with tho nodes:

  • hot-node with a master, ingest, data, and data_hot roles
  • cold-node with a data_cold role

We specify a command for installing the Elasticsearch GCS repository plugin in the podTemplate spec.

Create an es-blog.yaml file with the following content:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: es-blog
spec:
  version: 7.10.0
  # We'll uncomment secureSettings after creating the gcs-credentials secret in the next step
  # secureSettings:
  # - secretName: gcs-credentials
  nodeSets:
  - name: hot-node
    count: 1
    config:
      node.store.allow_mmap: false
      node.roles: ["master", "ingest", "data", "data_hot"]
    podTemplate:
      spec:
        initContainers:
        - name: install-plugins
          command:
          - sh
          - -c
          - |
            bin/elasticsearch-plugin install --batch repository-gcs
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data-hot
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard
  - name: cold-node
    count: 1
    config:
      node.store.allow_mmap: false
      node.roles: ["data_cold"]
    podTemplate:
      spec:
        initContainers:
        - name: install-plugins
          command:
          - sh
          - -c
          - |
            bin/elasticsearch-plugin install --batch repository-gcs
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data-cold
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard

Next, let's apply the manifest and create our Elasticsearch cluster:

$ kubectl apply -f es-blog.yaml
$ elasticsearch.elasticsearch.k8s.elastic.co/es-blog created

# After a couple of minutes, health should be green:
$ kubectl get es
NAME      HEALTH   NODES   VERSION   PHASE   AGE
es-blog   green    2       7.10.0    Ready   109s

2. Set the GCS credentials in the Elasticsearch keystore

Assuming you already have a GCS repository containing an Elasticsearch snapshot, we will now add our GCS credentials to Elasticsearch's keystore by creating a secret that contains a service account JSON key file. Detailed information about how to obtain that file can be found in the Elasticsearch docs here.

Make sure to name the JSON key file gcs.client.default.credentials_file and create the secret as follows:

$ kubectl create secret generic gcs-credentials --from-file gcs.client.default.credentials_file
secret/gcs-credentials created

Next, for ECK to add the credentials to the Elasticsearch keystore, uncomment the secureSettings section in the es-blog.yaml file:

...
spec:
  version: 7.10.0
  secureSettings:
  - secretName: gcs-credential
...

Then apply the change:

$ kubectl apply -f es-blog.yaml
elasticsearch.elasticsearch.k8s.elastic.co/es-blog configured

3. Register the GCS snapshot repository with Elasticsearch

Now that we have the GCS credentials in Elasticsearch's keystore, we register our snapshot repository with Elasticsearch using the Put snapshot repository API:

$ curl -k -u $ES_CREDENTIALS -XPUT https://localhost:9200/_snapshot/gcs_repository -H 'Content-Type: application/json; charset=utf-8' -d \
'{
  "type" : "gcs",
  "settings" : {
    "bucket" : "es-blog-snapshots",
    "client" : "default"
  }
}'

{"acknowledged":true}

Now let's validate that our snapshot is available by listing the available snapshots in the GCS repository:

$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/_cat/snapshots/gcs_repository?v"
id               status  start_epoch start_time end_epoch  end_time duration indices successful_shards failed_shards total_shards
es-blog-snapshot SUCCESS 1607435193  13:46:33   1607435215 13:46:55    21.8s      10                10             0           10

4. Mount the snapshot using the Searchable snapshots mount API

Our snapshot contains a sample-data-flights index allocated to the cold_data tier.
When mounting the snapshot using the Searchable snapshots mount API, we need to specify the index whose data we would like to load, like so:

$ curl -k -u $ES_CREDENTIALS -XPOST https://localhost:9200/_snapshot/gcs_repository/es-blog-snapshot/_mount -H 'Content-Type: application/json; charset=utf8' -d \
'{
  "index": "sample-data-flights"
}'

{"accepted":true}

Let's examine the loaded index and its shards:

$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/_cat/shards/sample-data-flights?v"
index               shard prirep state    docs store ip          node
sample-data-flights 0     p      STARTED 13059 5.1mb 10.1.112.19 es-blog-es-cold-node-0

The loaded index has one shard, allocated to the cold node. Replica shards are not maintained for this index as our searchable snapshot will be used for resiliency upon node failure. Should the cold node fail, the shards from the searchable snapshot index will be automatically recovered from the GCS snapshot repository.

Now, let's make sure our data is queryable:

$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/sample-data-flights/_search?pretty" -H "Content-Type: application/json; charset: utf-8" -d \
'{
  "size": 0,
  "aggs": {
    "destination_country": {
      "terms": {
        "field": "Carrier"
      }
    }
  }
}'

...

  "aggregations" : {
    "destination_country" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Logstash Airways",
          "doc_count" : 3331
        },
        {
          "key" : "JetBeats",
          "doc_count" : 3274
        },
        {
          "key" : "Kibana Airlines",
          "doc_count" : 3234
        },
        {
          "key" : "ES-Air",
          "doc_count" : 3220
        }
      ]
    }
  }

:ok_hand:t4:

5. Let's test it!

It's time to see searchable snapshots in action and how it can be used to recover data from snapshot upon a "hardware failure" without the need to maintain replica shards.

How are we going to do this? well, here it goes:

  1. Delete the PVC associated with the cold node
  2. Delete the cold node pod
  3. ECK will then re-create the pod and the PVC.

Please note that we may have to delete the newly created pod because of a rare race condition that may occur where the new pod will get associated with the terminated PVC before a new one is actually created.

After applying this set of actions we will lose the local copy of the flights-data-sample index.
When querying that index we our data will be available as it will be recovered from the searchable snapshot from the GCS repository.

Let's do it:

# Delete the cold node's PVC
$ kubectl delete pvc elasticsearch-data-cold-es-blog-es-cold-node-0
persistentvolumeclaim "elasticsearch-data-cold-es-blog-es-cold-node-0" deleted

# PVC should be in terminating status
$ kubectl get pvc elasticsearch-data-cold-es-blog-es-cold-node-0
NAME                                             STATUS        VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
elasticsearch-data-cold-es-blog-es-cold-node-0   Terminating   pvc-c77ea9c3-eb39-473a-8c02-201848e4c04c   10Gi       RWO            standard       32h

# Delete the cold node pod
$ kubectl delete pod es-blog-es-cold-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "es-blog-es-cold-node-0" force deleted

Now, check the cold node status. If it stuck in Pending status, re-run the delete pod command:

$ kubectl get pods
NAME                     READY   STATUS    RESTARTS   AGE
es-blog-es-cold-node-0   0/1     Pending   0          2s
es-blog-es-hot-node-0    1/1     Running   0          22h

# Cold node pod is stuck in Pending status. Re-run the delete pod command:
$ kubectl delete pod es-blog-es-cold-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "es-blog-es-cold-node-0" force deleted

We simulated a failure for the cold node which has been re-created by ECK. Because the associated PVC was deleted, we lost the local copy of the flight-data-sample index. Elasticsearch makes sure to recover the index data from the searchable snapshot.

Let's query the index to make sure our data is still available:

$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/sample-data-flights/_search?pretty" -H "Content-Type: application/json; charset: utf-8" -d \
'{
  "size": 0,
  "aggs": {
    "destination_country": {
      "terms": {
        "field": "Carrier"
      }
    }
  }
}'

...

  "aggregations" : {
    "destination_country" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Logstash Airways",
          "doc_count" : 3331
        },
        {
          "key" : "JetBeats",
          "doc_count" : 3274
        },
        {
          "key" : "Kibana Airlines",
          "doc_count" : 3234
        },
        {
          "key" : "ES-Air",
          "doc_count" : 3220
        }
      ]
    }
  }

:ok_hand:t4:

What just happened here?

The cold node in our deployment failed because we deleted its PVC & pod. Elasticsearch automatically restored the shard data from the GCS repository and no shard replicas were needed.

Please note that searchable snapshot shards are restored in the background so you can search them even if they have not been fully restored.

Hopefully, this blog got you familiar with Searchable snapshots with ECK. For more information about searchable snapshots please refer to its documentation.

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.