Hi all,
We are having a quite a strange issue that running filebeat in any version higher than 8.0.0 will cause filebeats agent to start increasing Memory consumption until the pod is OOM killed. As mentioned versions 8.0.0 and below work fine.
Configuration of of beat is as follows
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
name: mynamespace-filebeat
namespace: mynamespace
spec:
configRef:
secretName: mynamespace-filebeat-config
daemonSet:
podTemplate:
metadata:
creationTimestamp: null
spec:
automountServiceAccountToken: true
containers:
- env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
name: filebeat
resources:
limits:
cpu: 1000m
memory: 2000Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- mountPath: /var/log/containers
name: varlogcontainers
- mountPath: /var/log/pods
name: varlogpods
- mountPath: /var/lib/docker/containers
name: varlibdockercontainers
dnsPolicy: ClusterFirstWithHostNet
hostNetwork: true
securityContext:
runAsUser: 0
serviceAccount: mynamespace-elastic-beat-filebeat
volumes:
- hostPath:
path: /var/log/containers
name: varlogcontainers
- hostPath:
path: /var/log/pods
name: varlogpods
- hostPath:
path: /var/lib/docker/containers
name: varlibdockercontainers
updateStrategy: {}
elasticsearchRef:
name: mynamespace-elastic
kibanaRef:
name: mynamespace-kibana
monitoring:
logs: {}
metrics: {}
type: filebeat
// any version higher than 8 will start fine but memory consumption will increase until pod is killed
version: 8.0.0
Data handling configuration, we are filtering based on namespaces/containers namee. Analysing, extracting and enriching fields.
apiVersion: v1
kind: Secret
metadata:
name: mynamespace-filebeat-config
namespace: mynamespace
stringData:
beat.yml: |
filebeat.autodiscover:
providers:
- type: kubernetes
templates:
- condition:
and:
- contains.kubernetes.container.name: "containerName"
- or:
- contains.kubernetes.namespace: "namespace1"
- contains.kubernetes.namespace: "namespace2"
- contains.kubernetes.namespace: "namespace3"
- contains.kubernetes.namespace: "namespace4"
config:
- type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
- drop_event.when:
not:
contains:
message: "Bearer"
- dissect:
when:
and:
- contains:
kubernetes.container.name: "containerName"
- contains:
message: "Bearer"
tokenizer: '%{potential_space}request:"%{request}" response_code:%{response_code} authorization:"Bearer %{encoded_jwt_header}.%{encoded_jwt_payload}.%{encoded_jwt_signature}" authority:"%{authority}"'
field: "message"
target_prefix: ""
- copy_fields:
when:
and:
- contains:
kubernetes.container.name: "containerName"
- has_fields: ['request']
fields:
- from: request
to: endpoint
fail_on_error: true
ignore_missing: false
- script:
when:
and:
- contains:
kubernetes.container.name: "containerName"
- has_fields: ['endpoint']
lang: javascript
id: strip_endpoint_value
source: >
function process(event) {
// Extract endpoint without parameters
event.Put('endpoint', event.Get('endpoint').replace(/^\S* ([^?]*).* .*/,'$1'))
}
- script:
when:
and:
- contains:
kubernetes.container.name: "containerName"
- has_fields: ['encoded_jwt_payload']
lang: javascript
id: prepare_base64_decoding
source: >
function process(event) {
event.Put('encoded_jwt_payload', event.Get('encoded_jwt_payload') + Array((4 - event.Get('encoded_jwt_payload').length% 4) % 4 + 1).join('='))
}
- decode_base64_field:
when:
and:
- contains:
kubernetes.container.name: "containerName"
- has_fields: ['encoded_jwt_payload']
field:
from: "encoded_jwt_payload"
to: "decoded_jwt_payload"
ignore_missing: false
fail_on_error: true
- decode_json_fields:
when:
and:
- contains:
kubernetes.container.name: "containerName"
- has_fields: ['decoded_jwt_payload']
fields: ["decoded_jwt_payload"]
process_array: false
max_depth: 1
target: ""
overwrite_keys: false
add_error_key: true
- include_fields:
when:
and:
- contains:
kubernetes.container.name: containerName"
- has_fields: ['decoded_jwt_payload']
fields: ["field1", "field2", "field3", "field4", "field5", "field6", "field7"]
- condition:
and:
- not.contains.kubernetes.container.name: "containerName"
- or:
- contains.kubernetes.namespace: "namespace1"
- contains.kubernetes.namespace: "namespace2"
- contains.kubernetes.namespace: "namespace3"
- contains.kubernetes.namespace: "namespace4"
- contains.kubernetes.namespace: "otherNamespace"
config:
- type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}*.log
add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
processors:
- dissect:
tokenizer: '%{datetime} [%{thread}] %{loglevel->} %{logger} %{msg}'
field: "message"
target_prefix: ""
multiline:
pattern: '^([0-9]{4}-[0-9]{2}-[0-9]{2})'
negate: true
match: after
- condition:
and:
- not.contains.kubernetes.container.name: "containerName"
- or:
- contains.kubernetes.namespace: "namespace5"
- contains.kubernetes.namespace: "namespace6"
- contains.kubernetes.namespace: "namespace7"
config:
- type: container
paths:
- /var/log/containers/*${data.kubernetes.container.id}*.log
add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
setup.template.settings:
index.number_of_shards: 20
index.number_of_replicas: 1
Here a snapshot of a pod running on 8.0.0
Here a snapshot of a pod running on 8.6.1
Things we tried:
- Include mem.queue config decreasing values:
queue.mem:
events: 2048
flush.min_events: 256
flush.timeout: 5s
- Decrease number of shards from 20 to 3.
If anybody has any pointers for us on what we could possibly check or change on configuration to make this work on current versions would be highly appreciated.
Note: We have about 19GiB - 21GiB data per week, retaining for 90 days in cases it's relevant.
Thanks,
Andre