Hi, I'm trying to setup a cluster using the ECK operator 1.8.0 and a GKE cluster with dedicated node pools:
- 1 node pool for Elasticsearch masters in europe-west1-b
- 1 node pool in europe-west1-b and in europe-west1-d for data nodes
- 1 node pool in europe-west1-b for Kibana and Fleet Server
My goal is to have a cluster with 3 masters and 2 data nodes plus 1 cold data node in 2 zones (9 nodes in total).
The "System and Kubernetes integrations" recipe from the GitHub repository works. From here, I change the version of Elasticsearch, Kibana, Fleet Server and Elastic Agents, expose Kibana with LoadBalancer. Requests and limits are also configured, but may require some tuning. This configuration also works. Finally, I add nodeSelectors where required so that each pod is scheduled on a proper node. This configuration no longer works... even if I only add the nodeSelector to the recipe.
The final manifests should look like the following:
Elasticsearch:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 7.15.1
nodeSets:
- name: master-zone-b
count: 3
config:
node.roles: [ "master" ]
node.attr.zone: europe-west1-b
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
podTemplate:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: master
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- europe-west1-b
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: "-Xms6g -Xmx6g"
resources:
requests:
memory: 12Gi
cpu: 2
limits:
memory: 12Gi
initContainers:
- name: sysctl
securityContext:
privileged: true
command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: premium-rwo
- name: data-zone-b
count: 2
config:
node.roles: [ "data", "ingest" ]
node.attr.zone: europe-west1-b
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
podTemplate:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: data
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- europe-west1-b
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: "-Xms12g -Xmx12g"
resources:
requests:
memory: 24Gi
cpu: 6
limits:
memory: 24Gi
initContainers:
- name: sysctl
securityContext:
privileged: true
command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: premium-rwo
- name: data-zone-d
count: 2
config:
node.roles: [ "data", "ingest" ]
node.attr.zone: europe-west1-d
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
podTemplate:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: data
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- europe-west1-d
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: "-Xms12g -Xmx12g"
resources:
requests:
memory: 24Gi
cpu: 6
limits:
memory: 24Gi
initContainers:
- name: sysctl
securityContext:
privileged: true
command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 500Gi
storageClassName: premium-rwo
- name: cold-zone-b
count: 1
config:
node.roles: [ "data_cold" ]
node.attr.zone: europe-west1-b
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
podTemplate:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: data
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- europe-west1-b
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: "-Xms12g -Xmx12g"
resources:
requests:
memory: 24Gi
cpu: 6
limits:
memory: 24Gi
initContainers:
- name: sysctl
securityContext:
privileged: true
command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1000Gi
storageClassName: standard-rwo
- name: cold-zone-d
count: 1
config:
node.roles: [ "data_cold" ]
node.attr.zone: europe-west1-d
cluster.routing.allocation.awareness.attributes: k8s_node_name,zone
podTemplate:
spec:
nodeSelector:
cloud.google.com/gke-nodepool: data
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- europe-west1-d
containers:
- name: elasticsearch
env:
- name: ES_JAVA_OPTS
value: "-Xms12g -Xmx12g"
resources:
requests:
memory: 24Gi
cpu: 6
limits:
memory: 24Gi
initContainers:
- name: sysctl
securityContext:
privileged: true
command: [ 'sh', '-c', 'sysctl -w vm.max_map_count=262144' ]
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1000Gi
storageClassName: standard-rwo
updateStrategy:
changeBudget:
maxSurge: 1
maxUnavailable: 1
Kibana:
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
spec:
version: 7.15.1
count: 1
elasticsearchRef:
name: elasticsearch
config:
xpack.fleet.agents.elasticsearch.host: "https://elasticsearch-es-http.default.svc:9200"
xpack.fleet.agents.fleet_server.hosts: [ "https://fleet-server-agent-http.default.svc:8220" ]
xpack.fleet.packages:
- name: kubernetes
# pinning this version as the next one introduced a kube-proxy host setting default that breaks this recipe,
# see https://github.com/elastic/integrations/pull/1565 for more details
version: 0.14.0
xpack.fleet.agentPolicies:
- name: Default Fleet Server on ECK policy
is_default_fleet_server: true
package_policies:
- package:
name: fleet_server
name: fleet_server-1
- name: Default Elastic Agent on ECK policy
is_default: true
unenroll_timeout: 900
package_policies:
- package:
name: system
name: system-1
- package:
name: kubernetes
name: kubernetes-1
http:
service:
spec:
type: LoadBalancer
podTemplate:
spec:
containers:
- name: kibana
resources:
requests:
memory: 1Gi
cpu: 1
limits:
memory: 1Gi
nodeSelector:
cloud.google.com/gke-nodepool: kibana
ServiceAccounts, ClusterRoles and ClusterRoleBindings:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fleet-server
rules:
- apiGroups: [""]
resources:
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["coordination.k8s.io"]
resources:
- leases
verbs:
- get
- create
- update
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fleet-server
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fleet-server
subjects:
- kind: ServiceAccount
name: fleet-server
namespace: default
roleRef:
kind: ClusterRole
name: fleet-server
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: elastic-agent
rules:
- apiGroups: [""]
resources:
- pods
- nodes
- namespaces
- events
- services
- configmaps
verbs:
- get
- watch
- list
- apiGroups: ["coordination.k8s.io"]
resources:
- leases
verbs:
- get
- create
- update
- nonResourceURLs:
- "/metrics"
verbs:
- get
- apiGroups: ["extensions"]
resources:
- replicasets
verbs:
- "get"
- "list"
- "watch"
- apiGroups:
- "apps"
resources:
- statefulsets
- deployments
- replicasets
verbs:
- "get"
- "list"
- "watch"
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
- apiGroups:
- "batch"
resources:
- jobs
verbs:
- "get"
- "list"
- "watch"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-agent
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: elastic-agent
subjects:
- kind: ServiceAccount
name: elastic-agent
namespace: default
roleRef:
kind: ClusterRole
name: elastic-agent
apiGroup: rbac.authorization.k8s.io
Fleet Server:
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: fleet-server
spec:
version: 7.15.1
kibanaRef:
name: kibana
elasticsearchRefs:
- name: elasticsearch
mode: fleet
fleetServerEnabled: true
deployment:
replicas: 1
podTemplate:
spec:
serviceAccountName: fleet-server
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: agent
resources:
requests:
memory: 0.5Gi
cpu: 0.75
limits:
memory: 0.5Gi
nodeSelector:
cloud.google.com/gke-nodepool: kibana
Elastic Agents:
apiVersion: agent.k8s.elastic.co/v1alpha1
kind: Agent
metadata:
name: elastic-agent
spec:
version: 7.15.1
kibanaRef:
name: kibana
fleetServerRef:
name: fleet-server
mode: fleet
daemonSet:
podTemplate:
spec:
serviceAccountName: elastic-agent
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
automountServiceAccountToken: true
securityContext:
runAsUser: 0
containers:
- name: agent
resources:
requests:
memory: 0.5Gi
cpu: 0.75
limits:
memory: 0.5Gi
Here are the logs from the Fleet Server pod:
Performing setup of Fleet in Kibana
Kibana Fleet setup failed: http POST request to https://kibana-kb-http.default.svc:5601/api/fleet/setup fails: fail to execute the HTTP POST request: Post "https://kibana-kb-http.default.svc:5601/api/fleet/setup": context deadline exceeded (Client.Timeout exceeded while awaiting headers). Response:
Kibana Fleet setup failed: http POST request to https://kibana-kb-http.default.svc:5601/api/fleet/setup fails: fail to execute the HTTP POST request: Post "https://kibana-kb-http.default.svc:5601/api/fleet/setup": context deadline exceeded (Client.Timeout exceeded while awaiting headers). Response:
Am I missing something? Any suggestion on how to make it works?
Thank you so much!