Kibana no longer working

getorca · November 18, 2019, 11:10pm

As of a few hours ago, Kibana is no longer working. it crashed suddenly and no the pod for Kibana is failing to achieve a Ready state. The ES pods are still working fine.

The most obvious errors from the kibana pods seem to relate to getting a licence from Elasticsearch for xpack:

{"type":"log","@timestamp":"2019-11-18T23:08:21Z","tags":["warning","task_manager"],"pid":1,"message":"PollError Request Timeout after 30000ms"}
{"type":"log","@timestamp":"2019-11-18T23:08:40Z","tags":["license","warning","xpack"],"pid":1,"message":"License information from the X-Pack plugin could not be obtained from Elasticsearch for the [data] cluster. Error: Request Timeout after 30000ms"}

ES version: 7.2
Kubernetes version: v1alpha1

I'm not sure why this would occur suddenly. Any ideas to diagnose?

michael.morello · November 19, 2019, 10:19am

Hi,

Could you check the connectivity to the Elasticsearch cluster , see https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html#k8s_request_elasticsearch_access for more information about how to do this.

getorca · November 19, 2019, 6:22pm

No issues connecting to, or querying elasticsearch.

getorca · November 21, 2019, 6:13pm

I'm still unsure what happened or how to fix it. Last thing I did was create a new index, took a break, and came back in 30-60mins and Kibana could no longer connect to ES.

Resolved by deploying an new eck cluster and reindexing. I was due to update anyways.

getorca · November 22, 2019, 10:41pm

Another update on the issue of a loadbalancer breaking the connection from kibana to elastic search. It wasn't resolved, I can only assume 's where to blame for it working briefly on the beta for a hour or so and months on the alpha without the connection breaking. But i have found a solution that works.

The issue seems to be the internal connection / dns breaks when you expose the ES service as a loadbalancer. The best solution I've found is to specify the following Kibana config settings in the kibana.yaml this allows kibana to connect through the loadbalancer. eg:

...
spec:
  version: 7.4.2
  count: 1
  elasticsearchRef:
    name: default
  config:
    elasticsearch.hosts: https://YOUR_DOMAIN_NAME
    elasticsearch.username: elastic
    elasticsearch.password: ELASTIC_PASSWORD|SECRET

getorca · November 23, 2019, 6:31am

What the fuck :s

I deployed a simple config change to ES adding reindex.remote.whitelist: https://example:443 and Kibana now failed to connect.

Tried to recreate the cluster as before and it's not working. Kibana appears to be trying to connect on the internal DNS address, and ignoring the config settings above.

In addition, the ES cluster health is never getting to 'green' only 'unsure'. Trouble shooting and looking at the logs for ES there are no errors or suspicious logs on the ES pods, they look fine.

What the god damn fuck?

sebgl · November 25, 2019, 8:44am

Hi @getorca,

Can you share your entire yaml manifests?
If that helps, you can also create your own LoadBalancer service targeting the Elasticsearch Pods, and keep the default one managed by ECK "internal". So Kibana uses the internal one.

getorca · November 25, 2019, 8:28pm

The yamls below use the Digital Ocean load balancer annotations. I'm attempting to run them on Digital ocean managed Kubernetes, Kubernetes version 1.16.2

Elastic Search:

apiVersion: elasticsearch.k8s.elastic.co/v1beta1
kind: Elasticsearch
metadata:
  name: hugo
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: "http"
    service.beta.kubernetes.io/do-loadbalancer-algorithm: "round_robin"
    service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: "MY_DO_CERT_ID"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
spec:
  version: 7.4.2
  nodeSets:
  - name: default
    count: 3
    podTemplate:
      spec:
        initContainers:
         - name: set-sysctl
           securityContext:
             privileged: true 
           command: 
           - sh
           - -c
           - |
             sysctl -w vm.max_map_count=262144
         - name: install-plugins
           command:
           - sh
           - -c
           - |
             bin/elasticsearch-plugin install --batch repository-s3
    config:
      node.master: true
      node.data: true
      node.ingest: true
      reindex.remote.whitelist:  example.com:443
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 250Gi
        storageClassName: do-block-storage
  updateStrategy:
    changeBudget:
      maxSurge: 3
      maxUnavailable: 1
  http:
    service:
      spec:
        type: LoadBalancer
        ports:
          - name: https
            protocol: TCP
            port: 443
            targetPort: 9200
          - name: http
            protocol: TCP
            port: 80
            targetPort: 9200

Kibana:

apiVersion: kibana.k8s.elastic.co/v1beta1
kind: Kibana
metadata:
  name: hugo
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: "http"
    service.beta.kubernetes.io/do-loadbalancer-algorithm: "round_robin"
    service.beta.kubernetes.io/do-loadbalancer-tls-ports: "443"
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: "MY_DO_CERT_ID"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
spec:
  version: 7.4.2
  count: 1
  elasticsearchRef:
    name: hugo
  config:
    elasticsearch.hosts: https://huge-es-01.example.com
    elasticsearch.username: elastic
    elasticsearch.password: MY_ES_PASSWORD
  http:
    service:
      spec:
        type: LoadBalancer
        ports:
          - name: https
            protocol: TCP
            port: 443
            targetPort: 5601

To recap the steps that led to the issue:

1 - Successfully deployed both ES and Kibana with the above yamls, everything worked, health was green
2 - Updated the value of reindex.remote.whitelist to es.example.com:443. ES health stay green, stuck at 2 ES nodes, Kibana can no longer connect to the ES cluster.
3 - Spin up new DO Kuberenets cluster. Try to deploy with the above yaml files again. ES works but the health is "unsure", all 3 pods look like they are working. Querying ES works. Kibana can't connect to ES cluster.

sebgl · November 26, 2019, 9:23am

Updated the value of reindex.remote.whitelist to es.example.com:443 . ES health stay green, stuck at 2 ES nodes

Looks like the rolling upgrade did not go fine. If 1/3 Pods is not available (probably the one being upgraded) you can look at its logs (the Elasticsearch logs) to see if anything's wrong with its configuration. And maybe learn more about the failing reindex from cluster.

Do things work correctly if you unset the LoadBalancer type service?
We've seen other folks having issues with LoadBalancer services not being reachable using the internal DNS. A workaround is to create your own additional LoadBalancer service, see this example. Let me know what happens in your case!

getorca · November 26, 2019, 5:53pm

There where no errors or suspicious logs from any of the ES pods. I will try and recreate one more time.

Shouldn't setting the kibana config elasticsearch.hosts and username & password to use the full dns and auth bypass the internal DNS?

I had tried that, but was getting a 504 bad gateway, with both Kibana and Elasticsearch so went back to using the loadbalancer since it had worked for the past 3 months. And deployed correctly the first time I set the host config value in Kibana. :s

getorca · November 26, 2019, 6:12pm

@sebgl

I tried to deploy again on a brand new cluster. All 3 elastic search pods are running. There are no errors or unusual looking logs on any of the ES pods. The ES cluster health is still "unknown". I can connect to and query ES no problem.

The only error is from the ECK stateful set:

{"level":"error","@timestamp":"2019-11-26T18:12:11.759Z","logger":"controller-runtime.controller","message":"Reconciler error","ver":"1.0.0-beta1-84792e30","controller":"elasticsearch-controller","request":"default/hola","error":"unable to delete /_cluster/voting_config_exclusions: Delete https://hola-es-http.default.svc:9200/_cluster/voting_config_exclusions?wait_for_removal=false: dial tcp 10.245.223.51:9200: connect: connection timed out","errorCauses":[{"error":"unable to delete /_cluster/voting_config_exclusions: Delete https://hola-es-http.default.svc:9200/_cluster/voting_config_exclusions?wait_for_removal=false: dial tcp 10.245.223.51:9200: connect: connection timed out","errorVerbose":"Delete https://hola-es-http.default.svc:9200/_cluster/voting_config_exclusions?wait_for_removal=false: dial tcp 10.245.223.51:9200: connect: connection timed out\nunable to delete /_cluster/voting_config_exclusions\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/client.(*clientV7).DeleteVotingConfigExclusions\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/client/v7.go:53\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/version/zen2.ClearVotingConfigExclusions\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/version/zen2/voting_exclusions.go:78\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).reconcileNodeSpecs\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/nodes.go:92\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver.(*defaultDriver).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/driver/driver.go:234\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch.(*ReconcileElasticsearch).internalReconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:284\ngithub.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch.(*ReconcileElasticsearch).Reconcile\n\t/go/src/github.com/elastic/cloud-on-k8s/pkg/controller/elasticsearch/elasticsearch_controller.go:219\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.1/pkg/internal/controller/controller.go:216\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.1/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.1/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357"}],"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.1/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.1/pkg/internal/controller/controller.go:192\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.2.1/pkg/internal/controller/controller.go:171\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20190404173353-6a84e37a896d/pkg/util/wait/wait.go:88"}

So it looks like ES needs to connect to itself on the internal DNS. is there a way to set the ES to use an external DNS?

sebgl · November 27, 2019, 8:46am

@getorca ECK does need to connect to ES using the internal DNS anyway. So this needs to work.
I'd suggest again you keep the internal DNS and related ES & Kibana configuration default, so ECK can also connect to it.
Adding an additional LoadBalancer service type should normally work as expected. Your 504 bad gateway probably comes from a wrong service configuration?

getorca · December 3, 2019, 6:51pm

My mistake, it's a 502 Bad Gateway. My lb service yaml is as follows:

apiVersion: v1
kind: Service
metadata:
  name: es-loadbalancer
  annotations:
    service.beta.kubernetes.io/do-loadbalancer-protocol: "http"
    service.beta.kubernetes.io/do-loadbalancer-algorithm: "round_robin"
    service.beta.kubernetes.io/do-loadbalancer-certificate-id: "MY_CERT_ID"
    service.beta.kubernetes.io/do-loadbalancer-redirect-http-to-https: "true"
spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  ports:
  - name: https
    protocol: TCP
    port: 443
    targetPort: 9200
  selector:
    common.k8s.elastic.co/type: elasticsearch
    elasticsearch.k8s.elastic.co/cluster-name: sample

getorca · December 3, 2019, 8:09pm

It seems like I have everthing working, with the above loadbalancer service and disabling tls on in the elasticsearch yaml like @sebgl described here Public SSL'ed access with Ingress not working in option 3, i guess that option is now available (https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-accessing-elastic-services.html#k8s-disable-tls).

Topic		Replies	Views
Connecting Kibana to Elasticsearch on Kubernetes Elasticsearch	5	3887	July 6, 2017
Kibana can't connect to ES having (openstack) LoadBalancer setting Elastic Cloud on Kubernetes (ECK)	5	1237	November 4, 2022
Kibana: Unable to connect to Elasticsearch Elasticsearch	10	26280	July 5, 2017
Kibana Not Able to Connect to Elastic Master in Kubernetes from elastic-helm Kibana	7	3058	July 31, 2019
Load balancer between Kibana and Elasticsearch? Kibana	5	1801	July 6, 2017

Kibana no longer working

Related topics