Hi Team,
We are currently running Elasticsearch 8.6 and various beats (filebeat, metricbeat) of version 8.6 in Oracle Kubernetes Engine (OKE) nodes. As part of the implementation, we are seeing below error messages printed in SELinux message logs. Due to the extensive logging, we are noticing disk pressure on various nodes and a number of pod evictions in both Non-Prod and Prod environment.
setroubleshoot[48371]: SELinux is preventing /usr/share/metricbeat/metricbeat from read access on the lnk_file exe. For complete SELinux messages run: sealert -l dc3813e7-1901-4f6d-8031-021fe0cd6b8c
setroubleshoot[48371]: SELinux is preventing /usr/share/metricbeat/metricbeat from read access on the lnk_file exe.#012#012***** Plugin catchall (100. confidence) suggests **************************#012#012If you believe that metricbeat should be allowed read access on the exe lnk_file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'metricbeat' --raw | audit2allow -M my-metricbeat#012# semodule -X 300 -i my-metricbeat.pp#012
kubelet[45585]: I0607 08:07:39.640422 45585 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/7684b8a1-adeb-4a84-9e2f-7d5b64f30856/volumes"
Below is the metricbeat YAML that has been applied. Since, it's applied as a daemonset, we are seeing the issue for all OKE nodes.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-daemonset-config
namespace: elasticsearch
labels:
k8s-app: metricbeat
data:
metricbeat.yml: |-
metricbeat.config.modules:
# Mounted `metricbeat-daemonset-modules` configmap:
path: ${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
metricbeat.autodiscover:
providers:
- type: kubernetes
scope: cluster
node: ${NODE_NAME}
unique: true
templates:
- config:
- module: kubernetes
hosts: ["kube-state-metrics:8080"]
period: 10s
add_metadata: true
metricsets:
#- state_node
#- state_deployment
#- state_daemonset
#- state_replicaset
#- state_pod
#- state_container
#- state_cronjob
#- state_resourcequota
#- state_statefulset
#- state_service
- module: kubernetes
metricsets:
#- apiserver
hosts: ["https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"]
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.certificate_authorities:
- /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
period: 30s
# Uncomment this to get k8s events:
- module: kubernetes
metricsets:
- event
# To enable hints based autodiscover uncomment this:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
metricbeat.modules:
- module: docker
metricsets:
- "container"
- "cpu"
- "diskio"
- "event"
- "healthcheck"
- "info"
- "image"
- "memory"
- "network"
hosts: ["unix:///var/run/docker.sock"]
period: 10s
enabled: false
processors:
- add_fields:
target: fields
fields:
env: ${ENVIRONMENT_NAME}
- add_cloud_metadata:
- add_host_metadata:
netinfo.enabled: true
cloud.id: ${ELASTIC_CLOUD_ID}
cloud.auth: ${ELASTIC_CLOUD_AUTH}
output.elasticsearch:
hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
username: ${ELASTICSEARCH_USERNAME}
password: ${ELASTICSEARCH_PASSWORD}
ssl.certificate_authorities:
- /home/admin/certificate/ca.crt
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-daemonset-modules
namespace: elasticsearch
labels:
k8s-app: metricbeat
data:
activemq.yml: |-
- module: activemq
metricsets:
- "broker"
- "queue"
- "topic"
period: 10s
hosts: ['amq-service.default.svc.cluster.local:8161','amqin-service.default.svc.cluster.local:8161','amqout-service.default.svc.cluster.local:8161','amqcrw-service.default.svc.cluster.local:8161','amqcrwin-service.default.svc.cluster.local:8161','amqcrwout-service.default.svc.cluster.local:8161']
path: '/api/jolokia/?ignoreErrors=true&canonicalNaming=false'
username: ${ACTIVEMQ_USERNAME}
password: ${ACTIVEMQ_PASSWORD}
headers:
Origin: localhost
system.yml: |-
- module: system
period: 10s
metricsets:
- cpu
- load
- memory
- network
- process
- process_summary
- uptime
#- core
#- diskio
#- socket
processes: ['.*']
process.include_top_n:
by_cpu: 5 # include top 5 processes by CPU
by_memory: 5 # include top 5 processes by memory
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)'
docker.yml: |-
- module: docker
metricsets:
- "container"
- "cpu"
- "diskio"
- "event"
- "healthcheck"
- "info"
- "image"
- "memory"
- "network"
hosts: ["unix:///var/run/docker.sock"]
period: 10s
enabled: false
kubernetes.yml: |-
- module: kubernetes
metricsets:
- node
- system
- pod
- container
- volume
#- state_node
#- state_daemonset
#- state_deployment
#- state_replicaset
#- state_statefulset
#- state_pod
#- state_container
#- state_cronjob
#- state_resourcequota
#- state_service
#- state_persistentvolume
#- state_persistentvolumeclaim
#- state_storageclass
period: 10s
host: ${NODE_NAME}
hosts: ["https://${NODE_NAME}:10250"]
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: "none"
# If there is a CA bundle that contains the issuer of the certificate used in the Kubelet API,
# remove ssl.verification_mode entry and use the CA, for instance:
#ssl.certificate_authorities:
#- /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
# Currently `proxy` metricset is not supported on Openshift, comment out section
- module: kubernetes
metricsets:
#- proxy
period: 10s
host: ${NODE_NAME}
hosts: ["localhost:10249"]
---
# Deploy a Metricbeat instance per node for node metrics retrieval
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: metricbeat
namespace: elasticsearch
labels:
k8s-app: metricbeat
spec:
selector:
matchLabels:
k8s-app: metricbeat
template:
metadata:
labels:
k8s-app: metricbeat
spec:
serviceAccountName: metricbeat
terminationGracePeriodSeconds: 30
hostNetwork: false
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: metricbeat
image: docker.elastic.co/beats/metricbeat:8.6.0
args: [
"-c", "/etc/metricbeat.yml",
"-e",
"-system.hostfs=/hostfs",
]
env:
- name: ENVIRONMENT_NAME
value: "PRODUCTION"
- name: ELASTICSEARCH_HOST
value: https://prodes.mycompany.com
- name: ELASTICSEARCH_PORT
value: "9200"
- name: ELASTICSEARCH_USERNAME
valueFrom:
secretKeyRef:
name: es-creds
key: username
- name: ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: es-creds
key: password
- name: ACTIVEMQ_USERNAME
valueFrom:
secretKeyRef:
name: activemq-creds
key: username
- name: ACTIVEMQ_PASSWORD
valueFrom:
secretKeyRef:
name: activemq-creds
key: password
- name: ELASTIC_CLOUD_ID
value:
- name: ELASTIC_CLOUD_AUTH
value:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
allowPrivilegeEscalation: false
privileged: false
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 1000
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/metricbeat.yml
readOnly: true
subPath: metricbeat.yml
- name: certificate
mountPath: /home/admin/certificate/ca.crt
readOnly: true
subPath: ca.crt
- name: data
mountPath: /usr/share/metricbeat/data
- name: modules
mountPath: /usr/share/metricbeat/modules.d
readOnly: true
- name: proc
mountPath: /hostfs/proc
readOnly: true
- name: cgroup
mountPath: /hostfs/sys/fs/cgroup
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: certificate
secret:
secretName: monitoring-es-http-certs-public
- name: config
configMap:
defaultMode: 0640
name: metricbeat-daemonset-config
- name: modules
configMap:
defaultMode: 0640
name: metricbeat-daemonset-modules
- name: data
emptyDir: {}
# hostPath:
# # When metricbeat runs as non-root user, this directory needs to be writable by group (g+w)
# path: /var/lib/metricbeat-data
# type: DirectoryOrCreate
securityContext:
fsGroup: 1000
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metricbeat
subjects:
- kind: ServiceAccount
name: metricbeat
namespace: elasticsearch
roleRef:
kind: ClusterRole
name: metricbeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: metricbeat
namespace: elasticsearch
subjects:
- kind: ServiceAccount
name: metricbeat
namespace: elasticsearch
roleRef:
kind: Role
name: metricbeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: metricbeat
labels:
k8s-app: metricbeat
rules:
- apiGroups: [""]
resources:
- nodes
- namespaces
- events
- pods
- services
verbs: ["get", "list", "watch"]
# Enable this rule only if planing to use Kubernetes keystore
#- apiGroups: [""]
# resources:
# - secrets
# verbs: ["get"]
- apiGroups: ["extensions"]
resources:
- replicasets
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
- deployments
- replicasets
verbs: ["get", "list", "watch"]
- apiGroups:
- ""
resources:
- nodes/stats
verbs:
- get
- nonResourceURLs:
- "/metrics"
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: metricbeat
namespace: elasticsearch
labels:
k8s-app: metricbeat
rules:
- apiGroups:
- coordination.k8s.io
resources:
- leases
verbs: ["get", "create", "update"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: metricbeat
namespace: elasticsearch
labels:
k8s-app: metricbeat
---
We are trying to understand what might be causing SELinux in creating all these messages. Additionally, if there is any way we can remove it, it will help us the disk pressure created by these additional logs.
Any help would be really appreciated. Many thanks in advance.