Quickstart "health" and "phase" are empty

I am trying ECK but got stuck right at start,

$ kubectl get elasticsearch quickstart
NAME HEALTH NODES VERSION PHASE AGE
quickstart 7.2.0 24m

Kubernetes - v1.15.3
Centos7 AWS Instance (t3.large)

Operator logs show timeout as seems to be trying to pull non-existent GitHub resources
github.com/elastic/cloud-on-k8s/operators/

I can't see "operators" in cloud-on-k8s.

Any guidance on troubleshooting appreciated !

Shirish

Hello Shirish,

The 'kubectl describe' command provides often more information to understand what's going on. Can you run it on your elasticsearch resource (kubectl describe elasticsearch) and your pods (kubectl describe pods) and share the outputs?

Here is a documentation to troubleshoot your cluster: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-troubleshooting.html.

You can't see "operators" in the cloud-on-k8s GitHub repository because we moved the content of this directoy up a level (https://github.com/elastic/cloud-on-k8s/pull/1616).

Hello Richard,

Thanks for taking the time to look into this !
I went through the troubleshooting page and found a useful way to enable debug but it wouldn't help me to take it further.

I have issued the following commands and attached the output to https://pastebin.com/UTJiRnUu

#kubectl get all -n elastic-system
#kubectl get events -n elastic-system
#kubectl describe pods -n elastic-system
#kubectl -n elastic-system logs statefulset.apps/elastic-operator
Then "--enable-debug-logs=true" and repeated
#kubectl -n elastic-system logs statefulset.apps/elastic-operator

I suspect the changes in the GitHub file structure hasn't been updated in "code" ,but I might be wrong. Pardon my ignorance as I am just an infra guy :slight_smile:
Thanks,
Shirish

Hi,

The ECK operator looks healthy but I do not have enough information to debug more. By default, the operator is deployed in the 'elastic-system' namespace and manages Elasticsearch, Kibana and APM server resources in the 'default' namespace.

Can you provide info about the Elasticsearch resource and its associated pods (without filtering with the 'elastic-system' namespace)?

kubectl get elasticsearch
kubectl describe elasticsearch
kubectl get pods
kubectl describe pods
kubectl get events
kubectl describe events

Hello Richard,

The output of the commands uploaded at https://pastebin.com/HQ9LaK5a

Thanks,
Shirish

@shirishatideal hmm, I am not sure if this is the issue but here is my guess:

  1. The ElasticSearch CR configures Version: 7.3.0 which is different from version: 7.2.0 from the quickstart guide. Can you see if the same issue occurs after changing it to 7.2.0?

  2. The error in the operator log suggest some validation failing (AFAIK) from CR and it could be the new field spec.nodes.name missing in the CR. Can you try adding it and see if helps?

Option (2) may not work as operator may be running an older version or older CRDs being submitted in which this field may not exist.

Let me know if either of these options help in troubleshooting your issue.

Hello Sarjeet,

Thanks for your attention.

1.The 7.2.0 version gives the exact same results.
2.I am unsure how to make those changes.

Shirish

"Timeout: request did not complete within requested timeout 30s" this seems to be the problem. I think this is an error returned by the apiserver to the operator.
I'm wondering if there might some kind of firewall/network issue preventing the operator to reach the apiserver.

@shirishatideal

Regarding (1), Did you delete the previous CR completely and submitted a new CR with the 7.2.0 version? Can you actually try deleting/cleaning everything and retry with matching every instruction as it is from quickstart if that helps?

For (2), you'll need to add the name to spec.nodes. For example:

nodes:
    - nodeCount: 2
      name: testgroup1
      config:
        node.master: true
        node.data: true
        node.ingest: true

If either of it still does't work, then it could be the setup or environment issue. you can try these on a minikube if that helps and then try debugging on non-working setup step-by-step.