How to deploy elastic on k8s to avoid join validation on cluster state with a different cluster uuid error

I am trying to deploy elastic stack on top of k8s with 2 client nodes, 3 master nodes and 3 data nodes. And all of them are configured on a different node now.

For the 2 client nodes. I am using k8s deployment, and for master nodes and data nodes I am using k8s statefulset. it worked for the first time if all nodes are fresh, but when I updated the master nodes's statedulset, I got the error as

Blockquote
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid gR1WEwUpRXynUdhOyF2axA than local cluster uuid 1gwMWNx0TPSd3j9-hxlgcA, rejecting

Here is the related k8s deployment for each node

A common setting for all the 3 type of nodes, but changed the setting by controlling the env.

apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch
namespace: elasticsearch
labels:
app: elasticsearch

data:
elasticsearch.yml: |-
cluster:
name: ${CLUSTER_NAME}
initial_master_nodes: "es-master-0,es-master-1,es-master-2"

node:
  master: ${NODE_MASTER}
  data: ${NODE_DATA}
  name: ${NODE_NAME}
  ingest: ${NODE_INGEST}
  max_local_storage_nodes: 1
  attr.box_type: hot
processors: ${PROCESSORS:1}
 network.host: ${NETWORK_HOST}
 path:
  data: /usr/share/elasticsearch/data
  logs: /usr/share/elasticsearch/logs
 http:
  compression: true
 discovery:
  seed_hosts: ${DISCOVERY_SERVICE}

master node:
a headless service for master nodes

apiVersion: v1
kind: Service
metadata:
name: elasticsearch-discovery
namespace: elasticsearch
labels:
component: elasticsearch
role: master
spec:
selector:
component: elasticsearch
role: master
ports:

  • name: transport
    port: 9300
    protocol: TCP
    clusterIP: None

configs for master node

  - name: elasticsearch
    env:
    - name: CLUSTER_NAME
      value: logs001
    - name: NUMBER_OF_MASTERS
      value: "3"
    - name: NODE_MASTER
      value: "true"
    - name: NODE_INGEST
      value: "false"
    - name: NODE_DATA
      value: "false"
    - name: NETWORK_HOST
      value: "0.0.0.0"
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: DISCOVERY_SERVICE
      value: elasticsearch-discovery
    - name: KUBERNETES_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: PROCESSORS
      valueFrom:
        resourceFieldRef:
          resource: limits.cpu
    - name: ES_JAVA_OPTS
      value: -Xms48g -Xmx48g

configs for data nodes

  env:
    - name: CLUSTER_NAME
      value: logs001
    - name: NODE_MASTER
      value: "false"
    - name: NODE_INGEST
      value: "false"
    - name: NETWORK_HOST
      value: "_eth0_"
    - name: NUMBER_OF_MASTERS
      value: "3"
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: DISCOVERY_SERVICE
      value: elasticsearch-discovery
    - name: KUBERNETES_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: NODE_DATA
      value: "true"
    - name: PROCESSORS
      valueFrom:
        resourceFieldRef:
          resource: limits.cpu
    - name: ES_JAVA_OPTS
      value: -Xms48g -Xmx48g

configs for client node

   env:
    - name: CLUSTER_NAME
      value: logs001
    - name: NUMBER_OF_MASTERS
      value: "3"
    - name: NODE_MASTER
      value: "false"
    - name: NODE_INGEST
      value: "true"
    - name: NODE_DATA
      value: "false"
    - name: NETWORK_HOST
      value: "_eth0_"
    - name: NODE_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
    - name: DISCOVERY_SERVICE
      value: elasticsearch-discovery
    - name: KUBERNETES_NAMESPACE
      valueFrom:
        fieldRef:
          fieldPath: metadata.namespace
    - name: PROCESSORS
      valueFrom:
        resourceFieldRef:
          resource: limits.cpu
    - name: ES_JAVA_OPTS
      value: -Xms6g -Xmx6g

I think this means that your master nodes all restarted at once and were not using persistent storage, so they lost the cluster metadata. Your master nodes must use storage that persists across restarts.

Thanks for the reply.
master node is using k8s statefulset, so it only have one pod restarted at one time.
ButI am not using persistent storage for master node. Let me check whether the issue will get resolved if I adds persistent disks for master nodes. btw, I think master node only has meta data, so is it ok a master node only use small amount of disks?

I think this is not the case, or else the config you quote above is not the one that Elasticsearch is using.

Yes, they normally need less storage than data nodes.

Appreciate your insights here

Here is the sts for the master node.

apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
component: elasticsearch
role: master
name: es-master
namespace: elasticsearch
spec:
serviceName: elasticsearch-master
replicas: 3 # Number of Elasticsearch master nodes to deploy
selector:
matchLabels:
component: elasticsearch
role: master
template:
metadata:
labels:
component: elasticsearch
role: master

Could you also suggest whether by using k8s's statefulset is sufficient? Or should I even wait for one master node is fully back to form a cluster?

Sorry, I'm not the best person to help with K8s-specific questions. I believe it's possible to use a statefulset, yes, although you might prefer to use the Elasticsearch operator.

I don't really understand this question in the context of Kubernetes, but in general you should try to avoid restarting more than one node at once.

Thanks

This issue is solved as @DavidTurner suggested like a charm! Thanks

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.