Hello Team,
I am running into issues with my Elasticsearch cluster v7.3.0 deployed under AWS EKS.
I have a 3 node cluster in which 3 PODs configured i.e. ES Master-1, ES Data-1, ES Ingest-1.
Persistent Valume and PersistenVolumeClaim has been configured as follows:
Persisten Volume:
Name: pvc-be7975f2-ce35-4e40-9c91-98e1d948362b Labels: failure-domain.beta.kubernetes.io/region=us-west-2 failure-domain.beta.kubernetes.io/zone=us-west-2a Annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner pv.kubernetes.io/bound-by-controller: yes pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs Finalizers: [kubernetes.io/pv-protection] StorageClass: gp2 Status: Bound Claim: accurics-lmm/ebs-gp2-storage-elasticsearch-data-0 Reclaim Policy: Delete Access Modes: RWO VolumeMode: Filesystem Capacity: 30Gi Node Affinity: Required Terms: Term 0: failure-domain.beta.kubernetes.io/zone in [us-west-2a] failure-domain.beta.kubernetes.io/region in [us-west-2] Message: Source: Type: AWSElasticBlockStore (a Persistent Disk resource in AWS) VolumeID: aws://us-west-2a/vol-08bdc3XXXXXXXXXX FSType: ext4 Partition: 0 ReadOnly: false
Below are the configurations for Data pod:
elasticsearch-data-configmap.yaml
--- apiVersion: v1 kind: ConfigMap metadata: namespace: accurics-lmm name: elasticsearch-data-config labels: app: elasticsearch role: data data: elasticsearch.yml: |- cluster.name: ${CLUSTER_NAME} node.name: ${NODE_NAME} discovery.seed_hosts: ${NODE_LIST} cluster.initial_master_nodes: ${MASTER_NODES} network.host: 0.0.0.0 node: master: false data: true ingest: false xpack.security.enabled: true xpack.monitoring.collection.enabled: true path.data: /usr/share/elasticsearch/data ---
elasticsearch-data-statefulset.yaml
--- apiVersion: apps/v1 kind: StatefulSet metadata: namespace: accurics-lmm name: elasticsearch-data labels: app: elasticsearch role: data spec: serviceName: "elasticsearch-data" replicas: 2 selector: matchLabels: app: elasticsearch-data template: metadata: labels: app: elasticsearch-data role: data spec: containers: - name: elasticsearch-data image: docker.elastic.co/elasticsearch/elasticsearch:7.3.0 env: - name: CLUSTER_NAME value: elasticsearch - name: NODE_NAME value: elasticsearch-data - name: NODE_LIST value: elasticsearch-master,elasticsearch-data,elasticsearch-client - name: MASTER_NODES value: elasticsearch-master - name: "ES_JAVA_OPTS" value: "-Xms300m -Xmx300m" ports: - containerPort: 9300 name: transport volumeMounts: - name: config mountPath: /usr/share/elasticsearch/config/elasticsearch.yml readOnly: true subPath: elasticsearch.yml volumes: - name: config configMap: name: elasticsearch-data-config initContainers: - name: increase-vm-max-map image: busybox:1.28 command: ["sh", "-c", "sysctl -w vm.max_map_count=262144"] securityContext: privileged: true - name: resolve-permission image: busybox:1.28 command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"] securityContext: privileged: true volumeMounts: - name: ebs-gp2-storage mountPath: /usr/share/elasticsearch/data - name: increase-fd-ulimit image: busybox:1.28 command: ["sh", "-c", "ulimit -n 65536"] securityContext: privileged: true volumeClaimTemplates: - metadata: name: ebs-gp2-storage annotations: volume.beta.kubernetes.io/storage-class: "gp2" spec: accessModes: [ "ReadWriteOnce" ] storageClassName: gp2 resources: requests: storage: 30Gi ---
Creating cluster at very first time works fine and cluster state updates to Green. It allow me to generate password and login using Kibana.
Issue:
After deleting the data pod/or rolling out updates, my cluster starts crashing with multiple errors.
{"type": "server", "timestamp": "2020-03-07T11:59:37,754+0000", "level": "WARN", "component": "o.e.x.m.MonitoringService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data", "cluster.uuid": "Mi1MiJAKSVy1HCxEpBSHeg", "node.id": "GoW0wiLDQr6YDQfXm_qMpA", "message": "monitoring execution failed" ,
"stacktrace": ["org.elasticsearch.xpack.monitoring.exporter.ExportException: failed to flush export bulks",
"Caused by: org.elasticsearch.action.UnavailableShardsException: [.monitoring-es-7-2020.03.07][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[.monitoring-es-7-2020.03.07][0]] containing [index {[.monitoring-es-7-2020.03.07][_doc][pVLctHABfG6veKc3jHmy]
After few minutes Kibana starts failing with below error:
{"type":"log","@timestamp":"2020-03-07T12:18:17Z","tags":["error","task_manager"],"pid":1,"message":"Failed to poll for work: [security_exception] failed to authenticate user [kibana], with { header={ WWW-Authenticate="Basic realm=\"security\" charset=\"UTF-8\"" } }
Please let me know if you need any further details. Please help me on this.