I upgraded elastic-search cluster from elasticsearch:7.10.0
to elasticsearch:7.11.0
and started noticing following error
{"type": "server", "timestamp": "2021-02-18T16:15:46,270Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "dev-eks-logs", "node.name": "es-master-2.elasticsearch", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
ERROR: [1] bootstrap checks failed
[1]: initial heap size [41943040] not equal to maximum heap size [1073741824]; this can cause resize pauses
ERROR: Elasticsearch did not exit normally - check the logs at /usr/share/elasticsearch/logs/dev-eks-logs.log
{"type": "server", "timestamp": "2021-02-18T16:15:46,315Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "dev-eks-logs", "node.name": "es-master-2.elasticsearch", "message": "stopping ..." }
{"type": "server", "timestamp": "2021-02-18T16:15:46,389Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "dev-eks-logs", "node.name": "es-master-2.elasticsearch", "message": "stopped" }
{"type": "server", "timestamp": "2021-02-18T16:15:46,389Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "dev-eks-logs", "node.name": "es-master-2.elasticsearch", "message": "closing ..." }
{"type": "server", "timestamp": "2021-02-18T16:15:46,402Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "dev-eks-logs", "node.name": "es-master-2.elasticsearch", "message": "closed" }
{"type": "server", "timestamp": "2021-02-18T16:15:46,404Z", "level": "INFO", "component": "o.e.x.m.p.NativeController", "cluster.name": "dev-eks-logs", "node.name": "es-master-2.elasticsearch", "message": "Native controller process has stopped - no new native processes can be started" }
Then all nodes goes into crashing cycle.
Here is my deployment manifest
StatefulSet
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: es-master
namespace: kube-logging
spec:
serviceName: elasticsearch
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:7.11.0
resources:
limits:
cpu: 1000m
memory: 2.5G
requests:
cpu: 100m
ports:
- containerPort: 9200
name: rest
protocol: TCP
- containerPort: 9300
name: inter-node
protocol: TCP
# readinessProbe:
# httpGet:
# path: /_cluster/health?local=true
# port: 9200
# initialDelaySeconds: 5
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
env:
- name: cluster.name
value: dev-eks-logs
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: node.name
value: "$(NODE_NAME).elasticsearch"
- name: discovery.zen.ping.unicast.hosts
value: "es-master-0.elasticsearch,es-master-1.elasticsearch,es-master-2.elasticsearch"
- name: cluster.initial_master_nodes
value: "es-master-0.elasticsearch,es-master-1.elasticsearch,es-master-2.elasticsearch"
- name: discovery.zen.minimum_master_nodes
value: "2"
- name: ES_JAVA_OPTS
value: "-Xmx1g -Xmx1g"
initContainers:
- name: fix-permissions
image: busybox
command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
securityContext:
privileged: true
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
volumeClaimTemplates:
- metadata:
name: data
labels:
app: elasticsearch
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: aws-gp2
resources:
requests:
storage: 200Gi
any suggestion how to fix this ?