Elasicsearch cluster is not created

Hi,

I'm trying the GA version, but my cluster doesn't start and I don't understand what I'm missing.
Previously I started cluster with the beta version.
I share with you my elasticsearch manifests, events I found about elasticsearch and the error log of the operator:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  creationTimestamp: "2020-02-21T15:45:18Z"
  generation: 1
  name: datawarehouse
  namespace: default
  resourceVersion: "55656287"
  selfLink: /apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/datawarehouse
  uid: 2967e3de-54c1-11ea-ae73-4201c0a8000a
spec:
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  image: gcr.io/hivebrite/elasticsearch7:7cabc9a
  nodeSets:
  - config:
      node.data: false
      node.master: true
    count: 3
    name: master
    podTemplate:
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: workloadType
                  operator: In
                  values:
                  - elasticsearch
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: datawarehouse
                topologyKey: failure-domain.beta.kubernetes.io/zone
            requiredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: datawarehouse
                topologyKey: kubernetes.io/hostname
        containers:
        - env:
          - name: ES_JAVA_OPTS
            value: -Xms4096m -Xmx4096m
          limits:
            cpu: 4000m
            memory: 8Gi
          name: elasticsearch
          resources:
            requests:
              cpu: 4000m
              memory: 8Gi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - SETPCAP
              - MKNOD
              - AUDIT_WRITE
              - CHOWN
              - NET_RAW
              - DAC_OVERRIDE
              - FOWNER
              - FSETID
              - KILL
              - SETGID
              - SETUID
              - NET_BIND_SERVICE
              - SYS_CHROOT
              - SETFCAP
            runAsUser: 1000
        initContainers:
        - command:
          - sh
          - -c
          - sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
        metadata:
          labels:
            clusterName: wip
            region: europe-west1
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard
  - config:
      node.data: true
      node.master: false
    count: 3
    name: data
    podTemplate:
      spec:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: workloadType
                  operator: In
                  values:
                  - elasticsearch
          podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: datawarehouse
                topologyKey: failure-domain.beta.kubernetes.io/zone
            requiredDuringSchedulingIgnoredDuringExecution:
            - podAffinityTerm:
                labelSelector:
                  matchLabels:
                    elasticsearch.k8s.elastic.co/cluster-name: datawarehouse
                topologyKey: kubernetes.io/hostname
        containers:
        - env:
          - name: ES_JAVA_OPTS
            value: -Xms4096m -Xmx4096m
          limits:
            cpu: 4000m
            memory: 8Gi
          name: elasticsearch
          resources:
            requests:
              cpu: 4000m
              memory: 8Gi
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - SETPCAP
              - MKNOD
              - AUDIT_WRITE
              - CHOWN
              - NET_RAW
              - DAC_OVERRIDE
              - FOWNER
              - FSETID
              - KILL
              - SETGID
              - SETUID
              - NET_BIND_SERVICE
              - SYS_CHROOT
              - SETFCAP
            runAsUser: 1000
        initContainers:
        - command:
          - sh
          - -c
          - sysctl -w vm.max_map_count=262144
          name: sysctl
          securityContext:
            privileged: true
        metadata:
          labels:
            clusterName: wip
            region: europe-west1
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: standard
  secureSettings:
  - secretName: datawarehouse-aws-credentials
  updateStrategy:
    changeBudget:
      maxSurge: 1
      maxUnavailable: 1
  version: 7.6.0

events:

Events:
  Type     Reason                   Age                 From                      Message
  ----     ------                   ----                ----                      -------
  Warning  CompatibilityCheckError  11m (x18 over 30m)  elasticsearch-controller  Error during compatibility check: Timeout: request did not complete within requested timeout 30s

operator error log:
{
"level":"error",
"@timestamp":"2020-02-21T15:48:59.347Z",
"logger":"controller-runtime.controller",
"message":"Reconciler error",
"ver":"1.0.1-bcb74688",
"controller":"elasticsearch-controller",
"request":"default/datawarehouse",
"error":"Timeout: request did not complete within requested timeout 30s",
"stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:88"
}

Also I have to specify, we are deploying with our helm chart and the deployment with helm can't finish.
So maybe there is an issue with the operator, with the kubectl I can see the pod running.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elastic-operator
  namespace: elastic-system
  labels:
    control-plane: elastic-operator
spec:
  selector:
    matchLabels:
      control-plane: elastic-operator
  serviceName: elastic-operator
  template:
    metadata:
      labels:
        control-plane: elastic-operator
    spec:
      serviceAccountName: elastic-operator
      containers:
        - image: docker.elastic.co/eck/eck-operator:1.0.1
          name: manager
          args:
            - manager
            - --operator-roles
            - all
            - --log-verbosity={{.Values.operator.logVerbosity}}
            - --metrics-port
            - {{.Values.operator.ports.metric | quote}}
          env:
            - name: OPERATOR_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: WEBHOOK_SECRET
              value: elastic-webhook-server-cert
            - name: WEBHOOK_PODS_LABEL
              value: elastic-operator
            - name: OPERATOR_IMAGE
              value: docker.elastic.co/eck/eck-operator:1.0.1
          resources:
{{ toYaml .Values.operator.resources | indent 12}}
          ports:
            - containerPort: {{.Values.operator.ports.webhook }}
              name: webhook-server
              protocol: TCP
          volumeMounts:
            - mountPath: /tmp/k8s-webhook-server/serving-certs
              name: cert
              readOnly: true
      terminationGracePeriodSeconds: 10
      volumes:
        - name: cert
          secret:
            defaultMode: 420
            secretName: elastic-webhook-server-cert

Nobody have an idea why I have the compatibility check error ?

It seems the error is related to that:

// EventCompatCheckError describes an error during the check for compatibility between operator version and managed resources.
EventCompatCheckError = "CompatibilityCheckError"

But I don't understand If I'm deploying bad version of my crd or is something else

I just to try in a new context with everything clean to follow the quickstart.
When I'm trying to setup the elasticsearch cluster I have a timeout on the apply

Could you provide us the logs of the operator ?

Regarding the timeout it could be caused by the webhook, see https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-troubleshooting.html#k8s-webhook-troubleshooting

Following the quickstart:

{"level":"info","@timestamp":"2020-02-25T14:47:39.680Z","logger":"controller-runtime.controller","message":"Starting EventSource","ver":"1.0.1-bcb74688","controller":"apmserver-controller","source":"kind source: /, Kind="}
{"level":"info","@timestamp":"2020-02-25T14:47:39.681Z","logger":"controller-runtime.controller","message":"Starting Controller","ver":"1.0.1-bcb74688","controller":"apmserver-controller"}
{"level":"info","@timestamp":"2020-02-25T14:47:39.681Z","logger":"controller-runtime.controller","message":"Starting workers","ver":"1.0.1-bcb74688","controller":"apmserver-controller","worker count":1}
{"level":"info","@timestamp":"2020-02-25T14:47:39.681Z","logger":"controller-runtime.controller","message":"Starting EventSource","ver":"1.0.1-bcb74688","controller":"elasticsearch-controller","source":"channel source: 0xc000338320"}
{"level":"info","@timestamp":"2020-02-25T14:47:39.681Z","logger":"controller-runtime.controller","message":"Starting Controller","ver":"1.0.1-bcb74688","controller":"elasticsearch-controller"}
{"level":"info","@timestamp":"2020-02-25T14:47:39.681Z","logger":"controller-runtime.controller","message":"Starting workers","ver":"1.0.1-bcb74688","controller":"elasticsearch-controller","worker count":1}
{"level":"info","@timestamp":"2020-02-25T14:47:39.692Z","logger":"webhook-certificates-controller","message":"Ending reconciliation run","ver":"1.0.1-bcb74688","iteration":1,"namespace":"elastic-system","name":"elastic-webhook-server-cert","took":0.011608211}
{"level":"info","@timestamp":"2020-02-25T14:47:39.692Z","logger":"webhook-certificates-controller","message":"Starting reconciliation run","ver":"1.0.1-bcb74688","iteration":2,"namespace":"","name":"elastic-webhook.k8s.elastic.co"}
{"level":"info","@timestamp":"2020-02-25T14:47:39.703Z","logger":"webhook-certificates-controller","message":"Ending reconciliation run","ver":"1.0.1-bcb74688","iteration":2,"namespace":"","name":"elastic-webhook.k8s.elastic.co","took":0.011222177}

Creating my own chart:
{
"level":"error",
"@timestamp":"2020-02-21T15:48:59.347Z",
"logger":"controller-runtime.controller",
"message":"Reconciler error",
"ver":"1.0.1-bcb74688",
"controller":"elasticsearch-controller",
"request":"default/datawarehouse",
"error":"Timeout: request did not complete within requested timeout 30s",
"stacktrace":"github.com/go-logr/zapr.(*zapLogger.Error\n\t/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller)).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:258[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller)).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:232[nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller](http://nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller)).worker\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.4.0/pkg/internal/controller/controller.go:211[nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:88](http://nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/apimachinery@v0.0.0-20191028221656-72ed19daf4bb/pkg/util/wait/wait.go:88)"
}

I'm running in gke

Are you running a private GKE cluster ?
Is there any firewall rule/NetworkPolicy that would prevent a call to the control plane from the Pod that is running ECK ?

Yes it's a privae GKE cluster.
Actually there is not network policy and there are only default firewall rules created automatically by GKE.

You can either:

Ok thank you it was the firewall for the webhook :slight_smile: