Wrongly named the container of the master node

Hi all. When I first created the Elasticsearch cluster in GKE with the ECK (1 master node, 32 data nodes), I gave a name to the container inside the master node - "elasticsearch-master". 6 months in, and 14 TB later, I noticed in the logs "Readiness probe failed". I realised I should not have named the container. Now I have 2 containers inside the master node, 'elasticsearch' and 'elasticsearch-master'. The 'elasticsearch' one fails the readiness probe, while the other one doesn't even have one defined, and it doesn't have any scripts installed by the operator, nor does it have the ports 9200 and 9300 open on the container. If I try to curl http from either container I get 'an empty reply'

Another weird thing is that I have the same setup (with just 2 data nodes) in the non production GKE cluster, and there the readiness probe works in the container named 'elasticsearch', and I get the proper response in both containers if I curl :9200 using the elastic-internal-probe user & password.

How does it even work, and what can I do to fix it with minimal downtime?

This is how I defined the pod template of the master node

            - name: sysctl
                privileged: true
                runAsUser: 0
              command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ] 
            - name: elasticsearch-master
              image: docker.elastic.co/elasticsearch/elasticsearch:8.3.2
                - name: ES_JAVA_OPTS
                  value: "-XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:-HeapDumpOnOutOfMemoryError"
                  cpu: ${var.elasticsearch_master_node_cpu}
                  memory: ${var.elasticsearch_master_node_memory}
                  cpu: ${var.elasticsearch_master_node_cpu}
                  memory: ${var.elasticsearch_master_node_memory}
            cloud.google.com/gke-nodepool: elasticsearch-nodepool
          - key: "dedicated"
            operator: "Equal"
            value: "elasticsearch"
            effect: "NoExecute"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.