We are getting issues with elasticsearch pods, the pods become in a pending state twice in 4 months.
sometimes, when it's loading a few dashboards on kibana the pods are restarting, we have added the ILM policy but frozen indices were still opened, to close these indices we have created a cronjob to close.
if there is anything with the details below, please feel free to share any find.
thanks a lot,
Pedro
ECK Info:
Our current stack is running ECK operator with fluentd(logs), heartbeat(uptime) and APM deployed into openshift cluster.
stack versions:
ECK 2.3.0
Elasticsearch: 7.13.3
Kibana: 7.13.3
Fluentd: 1.13.2
APM: 7.13.3
The elasticsearch resource configuration is:
Storage: 2000Gb (used 886Gb)
CPU: 1.7
Mem:10GB
java opts: -Xms8g -Xmx8g
nodes: 4
The indices ILM strategy is:
- For logs:
- hot: 3 days
- warm: + 3 days
- cold: + 7days (frozen)
Size by stage:
Hot/Warm data: 2.7Gb
Frozen & closed data: 883.3Gb
Erros from logs:
Caused by: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];",
"stacktrace": ["org.elasticsearch.xpack.monitoring.exporter.ExportException:
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]",
master not discovered or elected yet
org.elasticsearch.transport.NodeDisconnectedException
All shards failed
{"type": "server", "timestamp": "2022-09-24T23:53:13,151Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "[gc][4343174] overhead, spent [290ms] collecting in the last [1s]", "cluster.uuid": "oY7xZhUySiKbHHfr1t0pgQ", "node.id": "0YExrwJvRia_A6J-2aum4w" }
Elastic yml:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 7.13.3
volumeClaimDeletePolicy: DeleteOnScaledownOnly
nodeSets:
- name: default
config:
node.roles: ["master", "data", "ingest", "ml"]
path.repo: ["/elastic-snapshot"]
podTemplate:
metadata:
labels:
elastic: elastic
spec:
serviceAccount: elastic
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
containers:
- name: elasticsearch
securityContext:
capabilities:
add: ["SYS_CHROOT"]
resources:
limits:
memory: 10Gi
cpu: 1.7
env:
- name: INSTANCE_RAM
value: 10G
- name: ES_JAVA_OPTS
value: "-Xms8g -Xmx8g"
count: 4
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2000Gi