The ECK docs would greatly benefit from an MCVE on how to deploy ECK with Network-attached PersistentVolumes and a provisioner such as kubernetes.io/aws-ebs
. The Best Practices in AWS section suggests using one of Instance Store or EBS-based storage, but the section on Volume claim templates does not provide any relevant example.
For example:
Provision EKS cluster (3 x m5.xlarge workers).
Install ECK custom resource definitions and operator:
ECK_VERSION=1.8.0
kubectl create \
-f "https://download.elastic.co/downloads/eck/${ECK_VERSION}/crds.yaml"
kubectl apply \
-f "https://download.elastic.co/downloads/eck/${ECK_VERSION}/operator.yaml"
kubectl rollout status \
-n elastic-system \
--watch --timeout=600s \
statefulset.apps/elastic-operator
Create EBS volume (K8s docs hint the volume must be created):
aws ec2 create-volume \
--availability-zone=xx-xx-xx \
--size=64 \
--volume-type=gp2 \
--tag-specifications \
'ResourceType=volume,Tags=[{Key=Name,Value=ElasticData},{Key=Creator,Value=brsolomon}]'
aws ec2 wait volume-available \
--filters 'Name=tag:Name,Values=ElasticData'
Get volume ID:
sudo yum install -y jq
set -o pipefail
vol_id="$(aws ec2 describe-volumes --filters 'Name=tag:Name,Values=ElasticData' | jq -Mr .Volumes[0].VolumeId)"
Create storage class:
cat <<EOF | kubectl create -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
parameters:
type: io1
fsType: ext4
EOF
(Attempt to) apply an Elasticsearch cluster specification with custom volumeClaimTemplates
:
cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: quickstart
spec:
version: 7.15.0
nodeSets:
- name: default
count: 3
config:
node.store.allow_mmap: false
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 8Gi
storageClassName: aws-ebs
EOF
This fails with the elasticsearch
object stuck in unknown
state.
What am I doing wrong here? Do I need to pre-create EBS volumes, or can I use dynamic provisioning?
Info on failure:
$ kubectl get elasticsearch
NAME HEALTH NODES VERSION PHASE AGE
quickstart unknown 7.15.0 ApplyingChanges 6m16s
$ kubectl get pods --selector='elasticsearch.k8s.elastic.co/cluster-name=quickstart'
NAME READY STATUS RESTARTS AGE
quickstart-es-default-0 0/1 Pending 0 8m21s
quickstart-es-default-1 0/1 Pending 0 8m21s
quickstart-es-default-2 0/1 Pending 0 8m21s
$ kubectl -n elastic-system logs statefulset.apps/elastic-operator | tail -n25
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.029Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-http"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.039Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-0"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.039Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-1"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.039Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-2"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.325Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-default"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.360Z","log.logger":"driver","message":"ES cannot be reached yet, re-queuing","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:34.360Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":186,"namespace":"default","es_name":"quickstart","took":0.342183425}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.361Z","log.logger":"elasticsearch-controller","message":"Starting reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":187,"namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.361Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-transport"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.377Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-http"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.387Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-0"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.387Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-1"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.387Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-2"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.671Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-default"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.707Z","log.logger":"driver","message":"ES cannot be reached yet, re-queuing","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:44.707Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":187,"namespace":"default","es_name":"quickstart","took":0.346669437}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.708Z","log.logger":"elasticsearch-controller","message":"Starting reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":188,"namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.709Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-transport"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.727Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-http"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.736Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-2"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.736Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-0"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:54.736Z","log.logger":"transport","message":"Skipping pod because it has no IP yet","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","pod_name":"quickstart-es-default-1"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:55.054Z","log.logger":"generic-reconciler","message":"Updating resource","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","kind":"Service","namespace":"default","name":"quickstart-es-default"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:55.084Z","log.logger":"driver","message":"ES cannot be reached yet, re-queuing","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","namespace":"default","es_name":"quickstart"}
{"log.level":"info","@timestamp":"2021-09-28T12:43:55.084Z","log.logger":"elasticsearch-controller","message":"Ending reconciliation run","service.version":"1.8.0+4f367c38","service.type":"eck","ecs.version":"1.4.0","iteration":188,"namespace":"default","es_name":"quickstart","took":0.376270961}
From kubectl describe pods
I notice:
FailedScheduling
0/3 nodes are available
3 pod has unbound immediate PersistentVolumeClaims.