Hi,
We are experiencing high CPU usage on our filebeat instances on Windows machines, we're seeing between 15%-50% usage on the affected instances but there are active nodes in which there is no CPU usage even though there is activity (~100 events/s).
I've managed to reproduce the problem on a machine with the following configuration:
CPU
Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz
Maximum speed: 2,10 GHz
Sockets: 8
Virtual processors: 8
Virtual machine: Yes
L1 cache: N/A
Utilization 50%
Speed 2,10 GHz
Up time 31:09:27:09
Processes 288
Threads 7043
Handles 213881
Memory
20,0 GB
Slots used: N/A
Hardware reserved: 0,5 MB
Available 5,9 GB
Cached 4,9 GB
Committed 17,1/26,0 GB
Paged pool 986 MB
Non-paged pool 472 MB
In use 14,0 GB
Filebeat.yml
#=========================== Filebeat prospectors =============================
filebeat.inputs:
- type: log
enabled: true
paths:
- e:\software\app\log\application_log-*.log
tags: ['application', 'logs']
exclude_files: ['application_log-infra-db*.log']
json.message_key: message
json.overwrite_keys: true
json.keys_under_root: true
fields:
index_name: 'active-logs-application'
pipeline_name: 'application_pipeline'
ignore_older: 24h
close_removed: true
clean_removed: true
# close_timeout: 5m
- type: log
enabled: true
paths:
- e:\software\app\log\trace\application_log-*.log
tags: ['application', 'trace']
json.message_key: message
json.overwrite_keys: true
json.keys_under_root: true
fields:
index_name: 'active-trace-application'
pipeline_name: 'application_pipeline'
ignore_older: 24h
close_removed: true
clean_removed: true
# close_timeout: 5m
- type: log
encoding: 'latin1'
enabled: true
tags: ['gateway-logs']
paths:
- e:\software\gateway\log\gatewayvsc53.???.log.????.txt
fields:
index_name: 'active-logs-gateway'
pipeline_name: 'gateway-trace-pipeline'
ignore_older: 24h
close_removed: true
clean_removed: true
# close_timeout: 5m
- type: log
encoding: 'latin1'
enabled: true
tags: ['auth-logs']
include_lines: ['MMTraceId[[:blank:]]\[\w{16}\]$']
multiline:
negate: true
match: 'after'
pattern: '^$\n^\[\d{6}[[:blank:]]\d{6}\][[:blank:]]\[\d+\][[:blank:]].+$'
flush_pattern: '^\[\d{6}[[:blank:]]\d{6}\][[:blank:]]\[\d+\][[:blank:]]ped[[:blank:]]\[\d+\].+$'
paths:
- e:\software\auth\log\*_operacao.log
fields:
index_name: 'active-logs-auth'
pipeline_name: 'auth-input-trace'
ignore_older: 24h
close_removed: true
clean_removed: true
# close_timeout: 5m
setup.template.enabled: false
setup.ilm.enabled: false
#registry.flush: 10s
#max_procs: 1
logging.files.redirect_stderr: true
logging.to_files: true
output.elasticsearch:
enabled: true
hosts: ['???']
username: '???'
password: '???'
index: '%{[fields.index_name]}'
pipeline: '%{[fields.pipeline_name]}'
tags: ['windows']
About the filebeat.yml:
I tried to stay as close as possible to the production config, in the test machine the inputs are as follows:
application-logs* folder contains about 2k files, 1k being matched by the glob, in json format.
application-trace* folder contains 10 files, all matched by the glob, in json format.
gateway-* folder contains about 3k files, 2k being matched by the glob, in plain text.
auth-* folder contains about 40k files, 35k matched, in multiline plain text.
I started the process with the go profiler enabled, let in run for ~5m and then did a cpuprofile for 30s (http://localhost:9094/debug/pprof/profile?seconds=30)
Sadly I'm not allow to upload the profiling data to a hosting service, but I could send them by email if one is provided.
This problem happens with Filebeat 7.0.1 and 7.2.0, we've not noticed any problems when using filebeat on the 6.x line, but it's been a long time since we've upgraded to 7.x, the elasticsearch cluster is running on 7.0.1.