Elasticsearch instability using ECK operator on openshift cluster

brpedromaia · December 2, 2022, 10:10am

We are getting issues with elasticsearch pods, the pods become in a pending state twice in 4 months.

sometimes, when it's loading a few dashboards on kibana the pods are restarting, we have added the ILM policy but frozen indices were still opened, to close these indices we have created a cronjob to close.

if there is anything with the details below, please feel free to share any find.

thanks a lot,
Pedro

ECK Info:

Our current stack is running ECK operator with fluentd(logs), heartbeat(uptime) and APM deployed into openshift cluster.

stack versions:

ECK 2.3.0
Elasticsearch: 7.13.3
Kibana: 7.13.3
Fluentd: 1.13.2
APM: 7.13.3

The elasticsearch resource configuration is:

Storage: 2000Gb (used 886Gb)
CPU: 1.7
Mem:10GB
java opts: -Xms8g -Xmx8g
nodes: 4

The indices ILM strategy is:

For logs:
- hot: 3 days
- warm: + 3 days
- cold: + 7days (frozen)

Size by stage:

Hot/Warm data: 2.7Gb
Frozen & closed data: 883.3Gb

Erros from logs:

Caused by: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];",
"stacktrace": ["org.elasticsearch.xpack.monitoring.exporter.ExportException: 

ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/2/no master];]",

master not discovered or elected yet

org.elasticsearch.transport.NodeDisconnectedException

All shards failed

{"type": "server", "timestamp": "2022-09-24T23:53:13,151Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-default-0", "message": "[gc][4343174] overhead, spent [290ms] collecting in the last [1s]", "cluster.uuid": "oY7xZhUySiKbHHfr1t0pgQ", "node.id": "0YExrwJvRia_A6J-2aum4w"  }

Elastic yml:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 7.13.3
  volumeClaimDeletePolicy: DeleteOnScaledownOnly
  nodeSets:
  - name: default
    config:
      node.roles: ["master", "data", "ingest", "ml"]
      path.repo: ["/elastic-snapshot"]
    podTemplate:
      metadata:
        labels:
          elastic: elastic
      spec:
        serviceAccount: elastic
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        containers:
        - name: elasticsearch
          securityContext:
            capabilities:
              add: ["SYS_CHROOT"]
          resources:
            limits:
              memory: 10Gi
              cpu: 1.7
          env:
          - name: INSTANCE_RAM
              value: 10G
          - name: ES_JAVA_OPTS
            value: "-Xms8g -Xmx8g"
    count: 4
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 2000Gi

system · December 30, 2022, 10:11am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Single Instance Quickstart Cluster Crashes after 10 Minutes With ECK 0.8.1 Elastic Cloud on Kubernetes (ECK)	3	988	November 4, 2022
ECK Cluster Freezes when network fails for one node Elastic Cloud on Kubernetes (ECK)	2	300	November 4, 2022
"exception":{"stackTrace":"ClusterBlockException[blocked by: [FORBIDDEN/4/index closed];]\n\tat Elasticsearch	3	2092	September 19, 2018
[SERVICE_UNAVAILABLE/1/state not recovered / initialized]; Elasticsearch	7	4964	May 28, 2019
ECK and ES on EKS 1.12 installation / setup issue Elastic Cloud on Kubernetes (ECK)	2	1200	November 4, 2022

Elasticsearch instability using ECK operator on openshift cluster

ECK Info:

Related topics