I am using ECK to deploy an ES cluster. my setup has 1 master node and 3 data nodes. If for some reason master nodes dies it comes back up (thanks to statefulsets in kubernetes), but associates itself with a different cluster-ID. Thus it rejects the request of other data nodes who tries to join it.
but when I run an upgrade or something where kubernetes "safely" remove and bring back the master node cluster becomes healthy in a while.
I tried with 3 master nodes as well. but when I kill one node it is never able to join the existing cluster and the cluster goes to yellow state forever.
now my question is
what if some issue happens and the master node is not restarted safely how can I make sure my cluster formation is happening correctly.
Heres the yaml code for deploying ECK on my kubernetes cluster.
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: sifter-elastic-data-factory
spec:
version: 7.14.0
nodeSets:
- name: master
count: 1
config:
node.master: true
node.data: false
node.ingest: false
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 8Gi
cpu: 3000m
limits:
memory: 8Gi
cpu: 3000m
env:
- name: ES_JAVA_OPTS
value: -Xms6g -Xmx6g
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: ssd
- name: data
count: 3
config:
node.master: false
node.data: true
node.ingest: true
podTemplate:
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
resources:
requests:
memory: 8Gi
cpu: 3000m
limits:
memory: 8Gi
cpu: 3000m
env:
- name: ES_JAVA_OPTS
value: -Xms6g -Xmx6g
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 60Gi
storageClassName: ssd
http:
service:
spec:
type: ClusterIP
tls:
selfSignedCertificate:
disabled: true