As of a few hours ago, Kibana is no longer working. it crashed suddenly and no the pod for Kibana is failing to achieve a Ready state. The ES pods are still working fine.
The most obvious errors from the kibana pods seem to relate to getting a licence from Elasticsearch for xpack:
{"type":"log","@timestamp":"2019-11-18T23:08:21Z","tags":["warning","task_manager"],"pid":1,"message":"PollError Request Timeout after 30000ms"}
{"type":"log","@timestamp":"2019-11-18T23:08:40Z","tags":["license","warning","xpack"],"pid":1,"message":"License information from the X-Pack plugin could not be obtained from Elasticsearch for the [data] cluster. Error: Request Timeout after 30000ms"}
ES version: 7.2
Kubernetes version: v1alpha1
I'm not sure why this would occur suddenly. Any ideas to diagnose?
I'm still unsure what happened or how to fix it. Last thing I did was create a new index, took a break, and came back in 30-60mins and Kibana could no longer connect to ES.
Resolved by deploying an new eck cluster and reindexing. I was due to update anyways.
Another update on the issue of a loadbalancer breaking the connection from kibana to elastic search. It wasn't resolved, I can only assume 's where to blame for it working briefly on the beta for a hour or so and months on the alpha without the connection breaking. But i have found a solution that works.
The issue seems to be the internal connection / dns breaks when you expose the ES service as a loadbalancer. The best solution I've found is to specify the following Kibana config settings in the kibana.yaml this allows kibana to connect through the loadbalancer. eg:
I deployed a simple config change to ES adding reindex.remote.whitelist: https://example:443 and Kibana now failed to connect.
Tried to recreate the cluster as before and it's not working. Kibana appears to be trying to connect on the internal DNS address, and ignoring the config settings above.
In addition, the ES cluster health is never getting to 'green' only 'unsure'. Trouble shooting and looking at the logs for ES there are no errors or suspicious logs on the ES pods, they look fine.
Can you share your entire yaml manifests?
If that helps, you can also create your own LoadBalancer service targeting the Elasticsearch Pods, and keep the default one managed by ECK "internal". So Kibana uses the internal one.
The yamls below use the Digital Ocean load balancer annotations. I'm attempting to run them on Digital ocean managed Kubernetes, Kubernetes version 1.16.2
1 - Successfully deployed both ES and Kibana with the above yamls, everything worked, health was green
2 - Updated the value of reindex.remote.whitelist to es.example.com:443. ES health stay green, stuck at 2 ES nodes, Kibana can no longer connect to the ES cluster.
3 - Spin up new DO Kuberenets cluster. Try to deploy with the above yaml files again. ES works but the health is "unsure", all 3 pods look like they are working. Querying ES works. Kibana can't connect to ES cluster.
Updated the value of reindex.remote.whitelist to es.example.com:443 . ES health stay green, stuck at 2 ES nodes
Looks like the rolling upgrade did not go fine. If 1/3 Pods is not available (probably the one being upgraded) you can look at its logs (the Elasticsearch logs) to see if anything's wrong with its configuration. And maybe learn more about the failing reindex from cluster.
Do things work correctly if you unset the LoadBalancer type service?
We've seen other folks having issues with LoadBalancer services not being reachable using the internal DNS. A workaround is to create your own additional LoadBalancer service, see this example. Let me know what happens in your case!
There where no errors or suspicious logs from any of the ES pods. I will try and recreate one more time.
Shouldn't setting the kibana config elasticsearch.hosts and username & password to use the full dns and auth bypass the internal DNS?
I had tried that, but was getting a 504 bad gateway, with both Kibana and Elasticsearch so went back to using the loadbalancer since it had worked for the past 3 months. And deployed correctly the first time I set the host config value in Kibana. :s
I tried to deploy again on a brand new cluster. All 3 elastic search pods are running. There are no errors or unusual looking logs on any of the ES pods. The ES cluster health is still "unknown". I can connect to and query ES no problem.
@getorca ECK does need to connect to ES using the internal DNS anyway. So this needs to work.
I'd suggest again you keep the internal DNS and related ES & Kibana configuration default, so ECK can also connect to it.
Adding an additional LoadBalancer service type should normally work as expected. Your 504 bad gateway probably comes from a wrong service configuration?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.