3 node deployment with eck-operator logging cluster.initial_master_nodes is set WARN logs

This will be my first post on here, I hope I get it right. Since I work for a Bank I cannot publish config files but I will try to give as much detail as I can.

We have deployed elasticsearch clusters using eck-operator with using ArgoCD applicationset (Helm version 2.11.0).
We used default helm values for the operator.
We apply Elasticsearch Kind CRDs as yamls to the namespaces with ArgoCD syncs.

Our initial cluster deployments are always working, no problems at all.
We also have one of our teams testing a search app on another namespace, which has no problem with it.

Our NodeSet has 3 nodes with all roles possible, there are no master only nodes.
We have a ceph-block storage class, which we used for elasticsearch-data pv claims.

On our clusters we see these logs sometimes constantly, sometimes once and dissapearing:

{"@timestamp":"2024-02-26T16:07:59.599Z", "log.level": "INFO", "message":"this node is locked into cluster UUID <UUID> and will not attempt further cluster bootstrapping", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.coordination.ClusterBootstrapService","elasticsearch.node.name":"<node-name>","elasticsearch.cluster.name":"<cluster-name>"}
{"@timestamp":"2024-02-26T16:09:21.686Z", "log.level": "INFO", "message":"this node is locked into cluster UUID <UUID> and will not attempt further cluster bootstrapping", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.cluster.coordination.ClusterBootstrapService","elasticsearch.node.name":"<node-name>","elasticsearch.cluster.name":"<cluster-name>"}

We previously built these clusters with helm charts, and the exact same thing happend as well. I know for a fact that after cluster deployment, elasticsearch containers should ignore the configuration for initial_masters.

What would be the cause for this?


I would like to give an update, I have been going into pods and describing hte statefulsets we have both on prod and test clusters.

The config file is generated and kept inside the same namespace with pods, it secret name is --es-config and inside there I can actually see the discovery settings taking in a file which is called unicast-hosts and it is another configmap on the same namespace.

But there is absolutely nothing related to cluster.initial_master_nodes configuration on any elasticsearch.yml file I can find.

    name: <cluster-name>
                attributes: <k8s-node-attribute>
    seed_hosts: []
    seed_providers: file -> unicast file with 3 pod IP:PORT

These are the only relevant lines I can find, and the elasticsearch.yml is being copied into /mnt/elastic-internal path and gets copied to the pods on restarts by prepare-fs.sh as far as I can see;

echo "Linking /mnt/elastic-internal/elasticsearch-config/elasticsearch.yml to /mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml"
ln -sf /mnt/elastic-internal/elasticsearch-config/elasticsearch.yml /mnt/elastic-internal/elasticsearch-config-local/elasticsearch.yml

When I bash into the running pods which throw these warning for nodes being locked for a cluster, I see the exact same elasticsearch.yml I see on k8s, which doesn't have anything related to cluster.initial_master_node setting.

Could there be another docker image default file somewhere that I am missing? or maybe this docker image over-writes some configs on restarts?

Can anybody help with this? I am trying to remove the cluster.initial_master_nodes setting but I cannot even find it.

One more edit: We have 4 clusters 2 for test 2 for production, which has absolutely the same k8s component configurations and elasticsearch yaml configurations. On test cluster, we only see this as a INFO once, but on production clusters "this node locked to cluster" message gets logged every 12 hours constantly.



Perhaps your cluster started with a volume already in use by another previous cluster, which might look like it's related to a bug in your volume provisioner that isn't properly recycling volumes?

See: This node is locked into cluster UUID and will not attempt further cluster bootstrapping - Common causes and quick fixes.

Thank you for the feedback,

We are using ceph-block volume claims, I will add the NodeSet configuration here about pvs:

    - metadata:
        name: elasticsearch-data
        - ReadWriteOnce
            storage: 20Gi
        storageClassName: ceph-block 

Wouldn't this cause the node, to fail to join cluster? Even with warnings, we can see the nodes in _cat/nodes and cluster is healthy. We have also tried indexing,searching,deleting etc. all seems to be working.


We are still experiencing this, does anyone succesfully got rid of these warnings?


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.