Fleet Server and Elastic Agents not working when adding nodeSelectors

Hi, I'm trying to setup a cluster using the ECK operator 1.8.0 and a GKE cluster with dedicated node pools:

  • 1 node pool for Elasticsearch masters in europe-west1-b
  • 1 node pool in europe-west1-b and in europe-west1-d for data nodes
  • 1 node pool in europe-west1-b for Kibana and Fleet Server

My goal is to have a cluster with 3 masters and 2 data nodes plus 1 cold data node in 2 zones (9 nodes in total).

The "System and Kubernetes integrations" recipe from the GitHub repository works. From here, I change the version of Elasticsearch, Kibana, Fleet Server and Elastic Agents, expose Kibana with LoadBalancer. Requests and limits are also configured, but may require some tuning. This configuration also works. Finally, I add nodeSelectors where required so that each pod is scheduled on a proper node. This configuration no longer works... even if I only add the nodeSelector to the recipe.

The final manifests should look like the following:

Elasticsearch:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
spec:
  version: 7.15.1
  nodeSets:
  - name: master-zone-b
    count: 3
    config:
      node.roles: [ "master" ]
      node.attr.zone: europe-west1-b
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
    podTemplate:
      spec:
        nodeSelector:
          cloud.google.com/gke-nodepool: master
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values:
                  - europe-west1-b
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms6g -Xmx6g"
          resources:
            requests:
              memory: 12Gi
              cpu: 2
            limits:
              memory: 12Gi
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 50Gi
        storageClassName: premium-rwo
  - name: data-zone-b
    count: 2
    config:
      node.roles: [ "data", "ingest" ]
      node.attr.zone: europe-west1-b
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
    podTemplate:
      spec:
        nodeSelector:
          cloud.google.com/gke-nodepool: data
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values:
                  - europe-west1-b
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms12g -Xmx12g"
          resources:
            requests:
              memory: 24Gi
              cpu: 6
            limits:
              memory: 24Gi
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: premium-rwo
  - name: data-zone-d
    count: 2
    config:
      node.roles: [ "data", "ingest" ]
      node.attr.zone: europe-west1-d
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
    podTemplate:
      spec:
        nodeSelector:
          cloud.google.com/gke-nodepool: data
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values:
                  - europe-west1-d
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms12g -Xmx12g"
          resources:
            requests:
              memory: 24Gi
              cpu: 6
            limits:
              memory: 24Gi
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 500Gi
        storageClassName: premium-rwo
  - name: cold-zone-b
    count: 1
    config:
      node.roles: [ "data_cold" ]
      node.attr.zone: europe-west1-b
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
    podTemplate:
      spec:
        nodeSelector:
          cloud.google.com/gke-nodepool: data
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values:
                  - europe-west1-b
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms12g -Xmx12g"
          resources:
            requests:
              memory: 24Gi
              cpu: 6
            limits:
              memory: 24Gi
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1000Gi
        storageClassName: standard-rwo
  - name: cold-zone-d
    count: 1
    config:
      node.roles: [ "data_cold" ]
      node.attr.zone: europe-west1-d
      cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
    podTemplate:
      spec:
        nodeSelector:
          cloud.google.com/gke-nodepool: data
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: topology.kubernetes.io/zone
                  operator: In
                  values:
                  - europe-west1-d
        containers:
        - name: elasticsearch
          env:
          - name: ES_JAVA_OPTS
            value: "-Xms12g -Xmx12g"
          resources:
            requests:
              memory: 24Gi
              cpu: 6
            limits:
              memory: 24Gi
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1000Gi
        storageClassName: standard-rwo
  updateStrategy:
    changeBudget:
      maxSurge: 1
      maxUnavailable: 1

Kibana:

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
spec:
  version: 7.15.1
  count: 1
  elasticsearchRef:
    name: elasticsearch
  config:
    xpack.fleet.agents.elasticsearch.host: "https://elasticsearch-es-http.default.svc:9200"
    xpack.fleet.agents.fleet_server.hosts: [ "https://fleet-server-agent-http.default.svc:8220" ]
    xpack.fleet.packages:
    - name: kubernetes
      # pinning this version as the next one introduced a kube-proxy host setting default that breaks this recipe,
      # see https://github.com/elastic/integrations/pull/1565 for more details
      version: 0.14.0
    xpack.fleet.agentPolicies:
    - name: Default Fleet Server on ECK policy
      is_default_fleet_server: true
      package_policies:
      - package:
          name: fleet_server
        name: fleet_server-1
    - name: Default Elastic Agent on ECK policy
      is_default: true
      unenroll_timeout: 900
      package_policies:
      - package:
          name: system
        name: system-1
      - package:
          name: kubernetes
        name: kubernetes-1
  http:
    service:
      spec:
        type: LoadBalancer
  podTemplate:
    spec:
      containers:
      - name: kibana
        resources:
          requests:
            memory: 1Gi
            cpu: 1
          limits:
            memory: 1Gi
      nodeSelector:
        cloud.google.com/gke-nodepool: kibana

ServiceAccounts, ClusterRoles and ClusterRoleBindings:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fleet-server
rules:
- apiGroups: [""]
  resources:
  - pods
  - nodes
  verbs:
  - get
  - watch
  - list
- apiGroups: ["coordination.k8s.io"]
  resources:
  - leases
  verbs:
  - get
  - create
  - update
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fleet-server
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fleet-server
subjects:
- kind: ServiceAccount
  name: fleet-server
  namespace: default
roleRef:
  kind: ClusterRole
  name: fleet-server
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: elastic-agent
rules:
- apiGroups: [""]
  resources:
  - pods
  - nodes
  - namespaces
  - events
  - services
  - configmaps
  verbs:
  - get
  - watch
  - list
- apiGroups: ["coordination.k8s.io"]
  resources:
  - leases
  verbs:
  - get
  - create
  - update
- nonResourceURLs:
  - "/metrics"
  verbs:
  - get
- apiGroups: ["extensions"]
  resources:
    - replicasets
  verbs: 
  - "get"
  - "list"
  - "watch"
- apiGroups:
  - "apps"
  resources:
  - statefulsets
  - deployments
  - replicasets
  verbs:
  - "get"
  - "list"
  - "watch"
- apiGroups:
  - ""
  resources:
  - nodes/stats
  verbs:
  - get
- apiGroups:
  - "batch"
  resources:
  - jobs
  verbs:
  - "get"
  - "list"
  - "watch"
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-agent
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: elastic-agent
subjects:
- kind: ServiceAccount
  name: elastic-agent
  namespace: default
roleRef:
  kind: ClusterRole
  name: elastic-agent
  apiGroup: rbac.authorization.k8s.io

Fleet Server:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: fleet-server
spec:
  version: 7.15.1
  kibanaRef:
    name: kibana
  elasticsearchRefs:
  - name: elasticsearch
  mode: fleet
  fleetServerEnabled: true
  deployment:
    replicas: 1
    podTemplate:
      spec:
        serviceAccountName: fleet-server
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
        containers:
        - name: agent
          resources:
            requests:
              memory: 0.5Gi
              cpu: 0.75
            limits:
              memory: 0.5Gi
        nodeSelector:
          cloud.google.com/gke-nodepool: kibana

Elastic Agents:

apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
  name: elastic-agent
spec:
  version: 7.15.1
  kibanaRef:
    name: kibana
  fleetServerRef:
    name: fleet-server
  mode: fleet
  daemonSet:
    podTemplate:
      spec:
        serviceAccountName: elastic-agent
        hostNetwork: true
        dnsPolicy: ClusterFirstWithHostNet
        automountServiceAccountToken: true
        securityContext:
          runAsUser: 0
        containers:
        - name: agent
          resources:
            requests:
              memory: 0.5Gi
              cpu: 0.75
            limits:
              memory: 0.5Gi

Here are the logs from the Fleet Server pod:

Performing setup of Fleet in Kibana

Kibana Fleet setup failed: http POST request to https://kibana-kb-http.default.svc:5601/api/fleet/setup fails: fail to execute the HTTP POST request: Post "https://kibana-kb-http.default.svc:5601/api/fleet/setup": context deadline exceeded (Client.Timeout exceeded while awaiting headers). Response: 
Kibana Fleet setup failed: http POST request to https://kibana-kb-http.default.svc:5601/api/fleet/setup fails: fail to execute the HTTP POST request: Post "https://kibana-kb-http.default.svc:5601/api/fleet/setup": context deadline exceeded (Client.Timeout exceeded while awaiting headers). Response:

Am I missing something? Any suggestion on how to make it works?

Thank you so much!