Hi,
currently I have a lot of trouble to keep the ELK alive. First it failed after 11 days. Then I redeployed everything because of no important production data. Now it failed with the same errors after 2 days.
I am using Elastic Cloud on Kubernetes 8.0, running in an Openshift 4.6 cluster using Azure Files. Images used are on version 7.15.0. (Elasticsearch/Kibana/Filebeat/Metricbeat)
I am completly new to Elastic, so I really appreciate your help. I tried to get the nessary parts out of the pod log.
Elasticsearch Pod Log
{"type": "server", "timestamp": "2021-10-07T07:08:42,729Z", "level": "ERROR", "component": "o.e.i.g.DatabaseRegistry", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-elastic-0", "message": "failed to download database [GeoLite2-ASN.mmdb]",
"stacktrace": ["org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];",
[...]
"Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Search rejected due to missing shards [[.kibana_task_manager_7.15.0_001][0]]. Consider using
allow_partial_search_results
setting to bypass this error.",
[...]
{"type":"retryable_es_client_error","message":"search_phase_execution_exception: ","error":{"name":"ResponseError","meta":{"body":{"error":{"root_cause":,"type":"search_phase_execution_exception","reason":"","phase":"open_search_context","grouped":true,"failed_shards":,"caused_by":{"type":"search_phase_execution_exception","reason":"Search rejected due to missing shards [[.kibana_task_manager_7.15.0_001][0]]. Consider using
allow_partial_search_results
setting to bypass this error.","phase":"open_search_context","grouped":true,"failed_shards":
Kibana Pod Log
{"type":"retryable_es_client_error","message":"search_phase_execution_exception: ","error":{"name":"ResponseError","meta":{"body":{"error":{"root_cause":,"type":"search_phase_execution_exception","reason":"","phase":"open_search_context","grouped":true,"failed_shards":,"caused_by":{"type":"search_phase_execution_exception","reason":"Search rejected due to missing shards [[.kibana_task_manager_7.15.0_001][0]]. Consider using
allow_partial_search_results
setting to bypass this error.","phase":"open_search_context","grouped":true,"failed_shards":
{"type":"log","@timestamp":"2021-10-07T07:46:41+00:00","tags":["fatal","root"],"pid":1215,"message":"Error: Unable to complete saved object migrations for the [.kibana_task_manager] index: Unable to complete the OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT step after 15 attempts, terminating.\n at migrationStateActionMachine
FATAL Error: Unable to complete saved object migrations for the [.kibana_task_manager] index: Unable to complete the OUTDATED_DOCUMENTS_SEARCH_OPEN_PIT step after 15 attempts, terminating.
Blockquote
Deployment yaml
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: filebeat
spec:
type: filebeat
version: 7.15.0
elasticsearchRef:
name: elasticsearch
kibanaRef:
name: kibana
config:
output.elasticsearch:
index: "filebeat-%{[agent.version]}-%{+xxxx.ww}"
setup.ilm.enabled: "false"
setup.template.name: "filebeat"
setup.template.pattern: "filebeat-*"
filebeat.autodiscover.providers:
- node: ${NODE_NAME}
type: kubernetes
hints.default_config.enabled: "false"
templates:
- condition.equals.kubernetes.namespace: aro-crs-dev-01
config:
- paths: ["/var/log/containers/*${data.kubernetes.container.id}.log"]
type: container
processors:
- decode_json_fields:
fields: "message"
process_array: false
max_depth: 1
target: "logMessage"
overwrite_keys: false
add_error_key: true
expand_keys: true
[...]
daemonSet:
podTemplate:
spec:
serviceAccountName: filebeat
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true # Allows to provide richer host metadata
containers:
- name: filebeat
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
privileged: true
volumeMounts:
- name: varlogcontainers
mountPath: /var/log/containers
- name: varlogpods
mountPath: /var/log/pods
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumes:
- name: varlogcontainers
hostPath:
path: /var/log/containers
- name: varlogpods
hostPath:
path: /var/log/pods
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: elastic
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: heartbeat
spec:
type: heartbeat
version: 7.15.0
elasticsearchRef:
name: elasticsearch
config:
heartbeat.monitors:
- type: tcp
schedule: '@every 5s'
hosts: ["elasticsearch-es-http.elastic.svc:9200"]
- type: tcp
schedule: '@every 5s'
hosts: ["kibana-kb-http.elastic.svc:5601"]
deployment:
replicas: 1
podTemplate:
spec:
serviceAccountName: heartbeat
securityContext:
runAsUser: 0
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: heartbeat
labels:
k8s-app: heartbeat
rules:
- apiGroups: [""]
resources:
- nodes
- namespaces
- pods
- services
verbs: ["get", "list", "watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: heartbeat
namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: heartbeat
subjects:
- kind: ServiceAccount
name: heartbeat
namespace: elastic
roleRef:
kind: ClusterRole
name: hearbeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: metricbeat
spec:
type: metricbeat
version: 7.15.0
elasticsearchRef:
name: elasticsearch
kibanaRef:
name: kibana
config:
metricbeat:
autodiscover:
providers:
- hints:
default_config: {}
enabled: "true"
host: ${NODE_NAME}
type: kubernetes
modules:
- module: prometheus
period: 10s
metricsets: ["collector"]
hosts: ["https://aro-crscs-dev-01.ngdalabor.de"]
metrics_path: /q/metrics
- module: prometheus
period: 10s
metricsets: ["collector"]
hosts: ["https://aro-crsias-dev-01.ngdalabor.de"]
metrics_path: /q/metrics
- module: prometheus
period: 10s
metricsets: ["collector"]
hosts: ["https://aro-crssps-dev-01.ngdalabor.de"]
metrics_path: /q/metrics
- module: prometheus
period: 10s
metricsets: ["collector"]
hosts: ["https://aro-crscs-int-01.ngdalabor.de"]
metrics_path: /q/metrics
- module: prometheus
period: 10s
metricsets: ["collector"]
hosts: ["https://aro-crsias-int-01.ngdalabor.de"]
metrics_path: /q/metrics
- module: prometheus
period: 10s
metricsets: ["collector"]
hosts: ["https://aro-crssps-int-01.ngdalabor.de"]
metrics_path: /q/metrics
- module: system
period: 30s
metricsets:
- cpu
- load
- memory
- network
- process
- process_summary
process:
include_top_n:
by_cpu: 5
by_memory: 5
processes:
- .*
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event:
when:
regexp:
system:
filesystem:
mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib)($|/)
- module: kubernetes
period: 10s
host: ${NODE_NAME}
hosts:
- https://${NODE_NAME}:10250
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl:
verification_mode: none
metricsets:
- node
- system
- pod
- container
- volume
processors:
- add_cloud_metadata: {}
- add_host_metadata: {}
daemonSet:
podTemplate:
spec:
serviceAccountName: metricbeat
automountServiceAccountToken: true # some older Beat versions are depending on this settings presence in k8s context
containers:
- args:
- -e
- -c
- /etc/beat.yml
- -system.hostfs=/hostfs
name: metricbeat
volumeMounts:
- mountPath: /hostfs/sys/fs/cgroup
name: cgroup
- mountPath: /var/run/docker.sock
name: dockersock
- mountPath: /hostfs/proc
name: proc
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true # Allows to provide richer host metadata
securityContext:
runAsUser: 0
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /sys/fs/cgroup
name: cgroup
- hostPath:
path: /var/run/docker.sock
name: dockersock
- hostPath:
path: /proc
name: proc
---
# permissions needed for metricbeat
# source: https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-module-kubernetes.html
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: metricbeat
rules:
- apiGroups:
- ""
resources:
- nodes
- namespaces
- events
- pods
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metricbeat
namespace: elastic
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metricbeat
subjects:
- kind: ServiceAccount
name: metricbeat
namespace: elastic
roleRef:
kind: ClusterRole
name: metricbeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch
spec:
version: 7.15.0
nodeSets:
- name: elastic
count: 3
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
storageClassName: azurefile-premiumstorageclass-prod-01
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: kibana
spec:
version: 7.15.0
count: 1
elasticsearchRef:
name: elasticsearch