Kibana keeps restarting with no error

Hi all.
I've deployed an EK stack on GKE, with operator 1.9.1, but my kibana pod keeps restarting.
The only pertinent log which I've been able to extract from kibana's pods is this:

rpc error: code = NotFound desc = an error occurred when try to find container XXXXXXX

After this message, the pod is being restarted.
These restarts happen every ~3-5 minutes.
Kibana has resource limits to 1GB and 1cpu and I have observed that it sometimes consumes more than 1cpu, 1.3-1.6 cpu. Could this be the reason for a continous restart?
Altough it restarts, the service works normally.
Can anyone point me in the right direction to narrow the possible solutions?

Your error is very suspicious, it looks like a low-level infrastructure error, not a Kibana error.

Kibana has resource limits to 1GB and 1cpu and I have observed that it sometimes consumes more than 1cpu, 1.3-1.6 cpu. Could this be the reason for a continous restart?

No. CPU is a compressible resource, which means that once the container reaches the limit, it will keep running but the OS will throttle it and keep de-scheduling from using the CPU.

Can anyone point me in the right direction to narrow the possible solutions?

It's hard without more information. Can you run eck-diagnostics and share the result archive?

Hey @Thibault_Richard
So I've ran the eck diagnostics tool and I've attached the results(with sanitized results)
eck diagnostics wetransfer
Thank you for your help.

 "config": {
                    "elasticsearch.hosts": [
                        "https://my_project-es-http:9200"
                    ],
                    "elasticsearch.requestTimeout": 120000,
                    "elasticsearch.shardTimeout": 120000,
                    "elasticsearch.ssl.certificateAuthorities": "/etc/certs/ca.crt",
                    "elasticsearch.username": "elastic-internal",
                    "xpack.security.sessionTimeout": 18000000
                }

elastic-internal is an internal user, it is not designed to be used to connect to Kibana. I'm suspecting that this is the root cause of your problem. Any reason not to use elasticsearchRef to configure the connection to Kibana ? Like this:

 "elasticsearchRef": {
                    "name": "my_project"
                }

Also there is no sign of OOMKiller activity in the events so I don't think resources is a problem here.

I vaguely remember that if I would try leave elasticsearchRef, kibana would not connect to my Elasticsearch so I had to explicitly set the settings.
Could you walk me through troubleshooting this and maybe trying to use the elasticsearchRef instead of manually inputing the settings? @michael.morello

So, I've patched the kibana deployment but I've got this error:

Events:
  Type     Reason            Age                     From               Message
  ----     ------            ----                    ----               -------
  Warning  AssociationError  2m11s (x5878 over 19h)  kibana-controller  Association backend for elasticsearch is not configured

I have also defined the elasticsearchRef to no good...Below is my deployment config

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  annotations:
    common.k8s.elastic.co/controller-version: 1.9.1
    kubectl.kubernetes.io/last-applied-configuration: |
  Elasticsearch Ref:
    Name:  my_project
  Enterprise Search Ref:
    Name:
  Monitoring:
    Logs:
    Metrics:
  Pod Template:
    Metadata:
      Creation Timestamp:  <nil>
      Labels:
        Kibana:  my_project
    Spec:
      Containers:
        Name:  kibana
        Resources:
          Limits:
            Cpu:     1
            Memory:  1Gi
          Requests:
            Cpu:     1
            Memory:  1Gi
        Volume Mounts:
          Mount Path:  /etc/certs
          Name:        elasticsearch-certs
          Read Only:   true
      Volumes:
        Name:  elasticsearch-certs
        Secret:
          Secret Name:  my_project-es-http-certs-public
  Version:              7.10.2

What else could I try? @Thibault_Richard @michael.morello

This is not expected, if Kibana is managed by ECK using elasticsearchRef should the the preferred way to manage the connection between Elasticsearch and Kibana, see the documentation here.

I've patched the kibana deployment but I've got this error:

It's a K8S Event, I can't tell if it is relevant or if it's related to a previous state of your Kibana resource. Could you provide your new Elasticsearch and Kibana manifests (as yaml, using -o yaml) and the logs from the operator ?

Hey @michael.morello

Here's the Elasticsearch manifest:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  annotations:
    common.k8s.elastic.co/controller-version: 1.9.1
    elasticsearch.k8s.elastic.co/cluster-uuid: BAjxmZg4TT-Ie-TyyeTJlQ
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"elasticsearch.k8s.elastic.co/v1","kind":"Elasticsearch","metadata":{"annotations":{},"name":"my_project","namespace":"default"},"spec":{"http":{"service":{"spec":{"ports":[{"port":9200,"targetPort":9200}],"type":"LoadBalancer"}}},"nodeSets":[{"config":{"indices.query.bool.max_clause_count":5000,"node.data":false,"node.ingest":false,"node.master":true,"node.ml":false,"node.store.allow_mmap":false,"xpack.security.authc.realms":{"native":{"native1":{"order":1}}}},"count":3,"name":"node-master","podTemplate":{"metadata":{"labels":{"es":"master-node"}},"spec":{"containers":[{"env":[{"name":"ES_JAVA_OPTS","value":"-Xms1g -Xmx1g"}],"name":"elasticsearch","resources":{"limits":{"cpu":1,"memory":"2Gi"},"requests":{"cpu":1,"memory":"2Gi"}}}],"initContainers":[{"command":["sh","-c","bin/elasticsearch-plugin install --batch repository-gcs\n"],"name":"install-plugins"}]}},"volumeClaimTemplates":[{"metadata":{"name":"elasticsearch-data"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"3Gi"}},"storageClassName":"ssd"}}]},{"config":{"indices.query.bool.max_clause_count":5000,"node.data":true,"node.ingest":true,"node.master":false,"node.ml":true,"node.store.allow_mmap":false,"xpack.security.authc.realms":{"native":{"native1":{"order":1}}}},"count":5,"name":"node-data","podTemplate":{"metadata":{"labels":{"es":"data-node"}},"spec":{"containers":[{"env":[{"name":"ES_JAVA_OPTS","value":"-Xms4g -Xmx4g"}],"name":"elasticsearch","resources":{"limits":{"cpu":4,"memory":"6Gi"},"requests":{"cpu":4,"memory":"6Gi"}}}],"initContainers":[{"command":["sh","-c","bin/elasticsearch-plugin install --batch repository-gcs\n"],"name":"install-plugins"}]}},"volumeClaimTemplates":[{"metadata":{"name":"elasticsearch-data"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"650Gi"}},"storageClassName":"ssd"}}]}],"secureSettings":[{"secretName":"gcs-credentials"}],"version":"7.10.2"}}
  creationTimestamp: "2022-01-07T12:07:12Z"
  generation: 3
  name: my_project
  namespace: default
  resourceVersion: "75169079"
  uid: db6d176f-6705-4ecc-bc8f-5ac7bea9b07c
spec:
  auth: {}
  http:
    service:
      metadata: {}
      spec:
        ports:
        - nodePort: 31170
          port: 9200
          protocol: TCP
          targetPort: 9200
        type: LoadBalancer
    tls:
      certificate: {}
  monitoring:
    logs: {}
    metrics: {}
  nodeSets:
  - config:
      indices.query.bool.max_clause_count: 5000
      node.data: false
      node.ingest: false
      node.master: true
      node.ml: false
      node.store.allow_mmap: false
      xpack.security.authc.realms:
        native:
          native1:
            order: 1
    count: 3
    name: node-master
    podTemplate:
      metadata:
        creationTimestamp: null
        labels:
          es: master-node
      spec:
        containers:
        - env:
          - name: ES_JAVA_OPTS
            value: -Xms1g -Xmx1g
          name: elasticsearch
          resources:
            limits:
              cpu: "1"
              memory: 2Gi
            requests:
              cpu: "1"
              memory: 2Gi
        initContainers:
        - command:
          - sh
          - -c
          - |
            bin/elasticsearch-plugin install --batch repository-gcs
          name: install-plugins
          resources: {}
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 3Gi
        storageClassName: ssd
      status: {}
  - config:
      indices.query.bool.max_clause_count: 5000
      node.data: true
      node.ingest: true
      node.master: false
      node.ml: true
      node.store.allow_mmap: false
      xpack.security.authc.realms:
        native:
          native1:
            order: 1
    count: 5
    name: node-data
    podTemplate:
      metadata:
        creationTimestamp: null
        labels:
          es: data-node
      spec:
        containers:
        - env:
          - name: ES_JAVA_OPTS
            value: -Xms4g -Xmx4g
          name: elasticsearch
          resources:
            limits:
              cpu: "4"
              memory: 6Gi
            requests:
              cpu: "4"
              memory: 6Gi
        initContainers:
        - command:
          - sh
          - -c
          - |
            bin/elasticsearch-plugin install --batch repository-gcs
          name: install-plugins
          resources: {}
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 650Gi
        storageClassName: ssd
      status: {}
  secureSettings:
  - secretName: gcs-credentials
  transport:
    service:
      metadata: {}
      spec: {}
    tls:
      certificate: {}
  updateStrategy:
    changeBudget: {}
  version: 7.10.2
status:
  availableNodes: 8
  health: green
  phase: Ready
  version: 7.10.2

Here's the kibana manifest:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  annotations:
    common.k8s.elastic.co/controller-version: 1.9.1
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"kibana.k8s.elastic.co/v1","kind":"Kibana","metadata":{"annotations":{},"name":"my_project-kb","namespace":"default"},"spec":{"config":{"elasticsearch.requestTimeout":120000,"elasticsearch.shardTimeout":120000,"xpack.security.sessionTimeout":18000000},"count":1,"elasticsearchRef":{"name":"my_project"},"podTemplate":{"metadata":{"labels":{"kibana":"my_project"}},"spec":{"containers":[{"name":"kibana","resources":{"limits":{"cpu":1,"memory":"1Gi"},"requests":{"cpu":1,"memory":"1Gi"}},"volumeMounts":[{"mountPath":"/etc/certs","name":"elasticsearch-certs","readOnly":true}]}],"volumes":[{"name":"elasticsearch-certs","secret":{"secretName":"my_project-es-http-certs-public"}}]}},"version":"7.10.2"}}
  creationTimestamp: "2022-01-07T13:22:12Z"
  generation: 4
  name: my_project-kb
  namespace: default
  resourceVersion: "76545739"
  uid: ff6b21e8-2443-4f54-8109-49ec409c556e
spec:
  config:
    elasticsearch.requestTimeout: 120000
    elasticsearch.shardTimeout: 120000
    xpack.security.sessionTimeout: 18000000
  count: 1
  elasticsearchRef:
    name: my_project
  enterpriseSearchRef:
    name: ""
  monitoring:
    logs: {}
    metrics: {}
  podTemplate:
    metadata:
      creationTimestamp: null
      labels:
        kibana: my_project
    spec:
      containers:
      - name: kibana
        resources:
          limits:
            cpu: 1
            memory: 1Gi
          requests:
            cpu: 1
            memory: 1Gi
        volumeMounts:
        - mountPath: /etc/certs
          name: elasticsearch-certs
          readOnly: true
      volumes:
      - name: elasticsearch-certs
        secret:
          secretName: my_project-es-http-certs-public
  version: 7.10.2
status:
  associationStatus: Pending
  availableNodes: 1
  count: 1
  elasticsearchAssociationStatus: Pending
  health: green
  selector: common.k8s.elastic.co/type=kibana,kibana.k8s.elastic.co/name=my_project-kb
  version: 7.10.2

Here is the operator logs, uploaded to wetransfer.

{
	"log.level": "error",
	"name": "my_project-kb",
	"namespace": "default",
	"error": "no port named [https] in service [default/my_project-es-http]",
	"errorCauses": [{
		"error": "no port named [https] in service [default/my_project-es-http]"
	}]
}

Kibana is expecting a port named https, but you override the ports spec in Elasticsearch without setting the name:

  http:
    service:
      metadata: {}
      spec:
        ports:
        - nodePort: 31170
          port: 9200
          protocol: TCP
          name: https ##### <--- Here
          targetPort: 9200
        type: LoadBalancer

Also you don't need this in your Kibana manifest:

        volumeMounts:
        - mountPath: /etc/certs
          name: elasticsearch-certs
          readOnly: true
      volumes:
      - name: elasticsearch-certs
        secret:
          secretName: my-project-es-http-certs-public

@michael.morello
So, I've updated the Elasticsearch deployment and added port name and also I've deleted those parts from the kibana manifest but now I get a 504 gateway timeout without any other errors when trying to access Kibana.
Any thoughts?

So basically the exposed service for Kibana, for some reason, changed the exposed port from 443 to 5601 and that was the reason for the 504 gateway timeout.
I consider the subject resolved and to be closed because my issues have been solved.
Thank you for your help @michael.morello @Thibault_Richard

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.