Deploying ECK with Network-attached PersistentVolumes (e.g. AWSElasticBlockStore)

The ECK docs would greatly benefit from an MCVE on how to deploy ECK with Network-attached PersistentVolumes and a provisioner such as kubernetes.io/aws-ebs. The Best Practices in AWS section suggests using one of Instance Store or EBS-based storage, but the section on Volume claim templates does not provide any relevant example.

For example:


Provision EKS cluster (3 x m5.xlarge workers).

Install ECK custom resource definitions and operator:

ECK_VERSION=1.8.0

kubectl create \
    -f "https://download.elastic.co/downloads/eck/${ECK_VERSION}/crds.yaml"
kubectl apply \
    -f "https://download.elastic.co/downloads/eck/${ECK_VERSION}/operator.yaml"

kubectl rollout status \
    -n elastic-system \
    --watch --timeout=600s \
    statefulset.apps/elastic-operator

Create EBS volume (K8s docs hint the volume must be created):

aws ec2 create-volume \
    --availability-zone=xx-xx-xx \
    --size=64 \
    --volume-type=gp2 \
    --tag-specifications \
        'ResourceType=volume,Tags=[{Key=Name,Value=ElasticData},{Key=Creator,Value=brsolomon}]'
aws ec2 wait volume-available \
    --filters 'Name=tag:Name,Values=ElasticData'

Get volume ID:

sudo yum install -y jq
set -o pipefail
vol_id="$(aws ec2 describe-volumes --filters 'Name=tag:Name,Values=ElasticData' | jq -Mr .Volumes[0].VolumeId)"

Create storage class:

cat <<EOF | kubectl create -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
parameters:
  type: io1
  fsType: ext4
EOF

(Attempt to) apply an Elasticsearch cluster specification with custom volumeClaimTemplates:

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.15.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteMany
        resources:
          requests:
            storage: 8Gi
        storageClassName: aws-ebs
EOF

This fails with the elasticsearch object stuck in unknown state.

What am I doing wrong here? Do I need to pre-create EBS volumes, or can I use dynamic provisioning?


Info on failure:

$ kubectl get elasticsearch
NAME         HEALTH    NODES   VERSION   PHASE             AGE
quickstart   unknown           7.15.0    ApplyingChanges   6m16s

$ kubectl get pods --selector='elasticsearch.k8s.elastic.co/cluster-name=quickstart'
NAME                      READY   STATUS    RESTARTS   AGE
quickstart-es-default-0   0/1     Pending   0          8m21s
quickstart-es-default-1   0/1     Pending   0          8m21s
quickstart-es-default-2   0/1     Pending   0          8m21s

$ kubectl -n elastic-system logs statefulset.apps/elastic-operator | tail -n25
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.029Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-http"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.039Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-0"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.039Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-1"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.039Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-2"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.325Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-default"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.360Z","log.logger":"driver","message":"ES cannot be reached yet, re-queuing","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.360Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":186,"namespace":"default","es_name":"quickstart","took":0.342183425}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.361Z","log.logger":"elasticsearch-controller","message":"Starting reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":187,"namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.361Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-transport"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.377Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-http"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.387Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-0"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.387Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-1"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.387Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-2"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.671Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-default"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.707Z","log.logger":"driver","message":"ES cannot be reached yet, re-queuing","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.707Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":187,"namespace":"default","es_name":"quickstart","took":0.346669437}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.708Z","log.logger":"elasticsearch-controller","message":"Starting reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":188,"namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.709Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-transport"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.727Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-http"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.736Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-2"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.736Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-0"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.736Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-1"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:55.054Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-default"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:55.084Z","log.logger":"driver","message":"ES cannot be reached yet, re-queuing","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:55.084Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":188,"namespace":"default","es_name":"quickstart","took":0.376270961}

From kubectl describe pods I notice:

FailedScheduling
0/3 nodes are available
3 pod has unbound immediate PersistentVolumeClaims.

Hi Brad,

If you are using EKS and want to use gp2 EBS volumes, you should not have to pre-create the volume manually.
Rather, you would create the right gp2 storage class, see Storage classes - Amazon EKS.
Then reference that storage class in your Elasticsearch volumeClaimTemplates . The corresponding volume will be provisioned for you by EKS on the fly.

Thanks @sebgl, I've updated my original post with an attempt to use an io1 storage class, but the cluster doesn't start. Does anything look off with that config?

Got this working; I think the issue was with the unnecessary use of ReadWriteMany in place of ReadWriteOnce.

Updated steps:

cat <<EOF | kubectl create -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: io1
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
  type: io1
  fsType: ext4
EOF
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.15.0
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 8Gi
        storageClassName: io1
EOF
$ kubectl wait --for=condition=ready pod/quickstart-es-default-{0,1,2}
pod/quickstart-es-default-0 condition met
pod/quickstart-es-default-1 condition met
pod/quickstart-es-default-2 condition met