Searchable snapshots, recently released as BETA in Elasticsearch 7.10, let you reduce your operating costs by using snapshots for resiliency rather than maintaining replica shards within a cluster.
In this blog, we’ll demonstrate how to create a hot-cold topology using Elastic Cloud on Kubernetes (ECK) where for the cold tier, we will mount a snapshot using the new searchable snapshots API. We will also demonstrate how data is recovered using searchable snapshots upon a failure in the cold tier.
The demonstration is carried out on Google Kubernetes Engine (GKE) and can easily be adjusted to other Kubernetes environments.
Prerequisites:
- GKE cluster with ECK 1.3.0 installed
- GKS repository with a snapshot containing an index in the cold phase, we will mount using the searchable snapshots API to the cold tier
You can control index lifecycle phases using ILM or Index-level data tier allocation filtering.
1. Create an Elasticsearch cluster with a hot-cold topology
The following Kubernetes manifest describes an Elasticsearch cluster with tho nodes:
- hot-node with a master, ingest, data, and data_hot roles
- cold-node with a data_cold role
We specify a command for installing the Elasticsearch GCS repository plugin in the podTemplate
spec.
Create an es-blog.yaml
file with the following content:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: es-blog
spec:
version: 7.10.0
# We'll uncomment secureSettings after creating the gcs-credentials secret in the next step
# secureSettings:
# - secretName: gcs-credentials
nodeSets:
- name: hot-node
count: 1
config:
node.store.allow_mmap: false
node.roles: ["master", "ingest", "data", "data_hot"]
podTemplate:
spec:
initContainers:
- name: install-plugins
command:
- sh
- -c
- |
bin/elasticsearch-plugin install --batch repository-gcs
volumeClaimTemplates:
- metadata:
name: elasticsearch-data-hot
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
- name: cold-node
count: 1
config:
node.store.allow_mmap: false
node.roles: ["data_cold"]
podTemplate:
spec:
initContainers:
- name: install-plugins
command:
- sh
- -c
- |
bin/elasticsearch-plugin install --batch repository-gcs
volumeClaimTemplates:
- metadata:
name: elasticsearch-data-cold
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
Next, let's apply the manifest and create our Elasticsearch cluster:
$ kubectl apply -f es-blog.yaml
$ elasticsearch.elasticsearch.k8s.elastic.co/es-blog created
# After a couple of minutes, health should be green:
$ kubectl get es
NAME HEALTH NODES VERSION PHASE AGE
es-blog green 2 7.10.0 Ready 109s
2. Set the GCS credentials in the Elasticsearch keystore
Assuming you already have a GCS repository containing an Elasticsearch snapshot, we will now add our GCS credentials to Elasticsearch's keystore by creating a secret that contains a service account JSON key file. Detailed information about how to obtain that file can be found in the Elasticsearch docs here.
Make sure to name the JSON key file gcs.client.default.credentials_file
and create the secret as follows:
$ kubectl create secret generic gcs-credentials --from-file gcs.client.default.credentials_file
secret/gcs-credentials created
Next, for ECK to add the credentials to the Elasticsearch keystore, uncomment the secureSettings
section in the es-blog.yaml
file:
...
spec:
version: 7.10.0
secureSettings:
- secretName: gcs-credential
...
Then apply the change:
$ kubectl apply -f es-blog.yaml
elasticsearch.elasticsearch.k8s.elastic.co/es-blog configured
3. Register the GCS snapshot repository with Elasticsearch
Now that we have the GCS credentials in Elasticsearch's keystore, we register our snapshot repository with Elasticsearch using the Put snapshot repository API
:
$ curl -k -u $ES_CREDENTIALS -XPUT https://localhost:9200/_snapshot/gcs_repository -H 'Content-Type: application/json; charset=utf-8' -d \
'{
"type" : "gcs",
"settings" : {
"bucket" : "es-blog-snapshots",
"client" : "default"
}
}'
{"acknowledged":true}
Now let's validate that our snapshot is available by listing the available snapshots in the GCS repository:
$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/_cat/snapshots/gcs_repository?v"
id status start_epoch start_time end_epoch end_time duration indices successful_shards failed_shards total_shards
es-blog-snapshot SUCCESS 1607435193 13:46:33 1607435215 13:46:55 21.8s 10 10 0 10
4. Mount the snapshot using the Searchable snapshots mount API
Our snapshot contains a sample-data-flights
index allocated to the cold_data tier.
When mounting the snapshot using the Searchable snapshots mount API, we need to specify the index whose data we would like to load, like so:
$ curl -k -u $ES_CREDENTIALS -XPOST https://localhost:9200/_snapshot/gcs_repository/es-blog-snapshot/_mount -H 'Content-Type: application/json; charset=utf8' -d \
'{
"index": "sample-data-flights"
}'
{"accepted":true}
Let's examine the loaded index and its shards:
$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/_cat/shards/sample-data-flights?v"
index shard prirep state docs store ip node
sample-data-flights 0 p STARTED 13059 5.1mb 10.1.112.19 es-blog-es-cold-node-0
The loaded index has one shard, allocated to the cold node. Replica shards are not maintained for this index as our searchable snapshot will be used for resiliency upon node failure. Should the cold node fail, the shards from the searchable snapshot index will be automatically recovered from the GCS snapshot repository.
Now, let's make sure our data is queryable:
$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/sample-data-flights/_search?pretty" -H "Content-Type: application/json; charset: utf-8" -d \
'{
"size": 0,
"aggs": {
"destination_country": {
"terms": {
"field": "Carrier"
}
}
}
}'
...
"aggregations" : {
"destination_country" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Logstash Airways",
"doc_count" : 3331
},
{
"key" : "JetBeats",
"doc_count" : 3274
},
{
"key" : "Kibana Airlines",
"doc_count" : 3234
},
{
"key" : "ES-Air",
"doc_count" : 3220
}
]
}
}
5. Let's test it!
It's time to see searchable snapshots in action and how it can be used to recover data from snapshot upon a "hardware failure" without the need to maintain replica shards.
How are we going to do this? well, here it goes:
- Delete the PVC associated with the cold node
- Delete the cold node pod
- ECK will then re-create the pod and the PVC.
Please note that we may have to delete the newly created pod because of a rare race condition that may occur where the new pod will get associated with the terminated PVC before a new one is actually created.
After applying this set of actions we will lose the local copy of the flights-data-sample
index.
When querying that index we our data will be available as it will be recovered from the searchable snapshot from the GCS repository.
Let's do it:
# Delete the cold node's PVC
$ kubectl delete pvc elasticsearch-data-cold-es-blog-es-cold-node-0
persistentvolumeclaim "elasticsearch-data-cold-es-blog-es-cold-node-0" deleted
# PVC should be in terminating status
$ kubectl get pvc elasticsearch-data-cold-es-blog-es-cold-node-0
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
elasticsearch-data-cold-es-blog-es-cold-node-0 Terminating pvc-c77ea9c3-eb39-473a-8c02-201848e4c04c 10Gi RWO standard 32h
# Delete the cold node pod
$ kubectl delete pod es-blog-es-cold-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "es-blog-es-cold-node-0" force deleted
Now, check the cold node status. If it stuck in Pending status, re-run the delete pod command:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
es-blog-es-cold-node-0 0/1 Pending 0 2s
es-blog-es-hot-node-0 1/1 Running 0 22h
# Cold node pod is stuck in Pending status. Re-run the delete pod command:
$ kubectl delete pod es-blog-es-cold-node-0 --force --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "es-blog-es-cold-node-0" force deleted
We simulated a failure for the cold node which has been re-created by ECK. Because the associated PVC was deleted, we lost the local copy of the flight-data-sample
index. Elasticsearch makes sure to recover the index data from the searchable snapshot.
Let's query the index to make sure our data is still available:
$ curl -k -u $ES_CREDENTIALS "https://localhost:9200/sample-data-flights/_search?pretty" -H "Content-Type: application/json; charset: utf-8" -d \
'{
"size": 0,
"aggs": {
"destination_country": {
"terms": {
"field": "Carrier"
}
}
}
}'
...
"aggregations" : {
"destination_country" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Logstash Airways",
"doc_count" : 3331
},
{
"key" : "JetBeats",
"doc_count" : 3274
},
{
"key" : "Kibana Airlines",
"doc_count" : 3234
},
{
"key" : "ES-Air",
"doc_count" : 3220
}
]
}
}
What just happened here?
The cold node in our deployment failed because we deleted its PVC & pod. Elasticsearch automatically restored the shard data from the GCS repository and no shard replicas were needed.
Please note that searchable snapshot shards are restored in the background so you can search them even if they have not been fully restored.
Hopefully, this blog got you familiar with Searchable snapshots with ECK. For more information about searchable snapshots please refer to its documentation.