Testing performance of Filebeat disk.queue
.
Environment: AWS EC2
OS: Centos 7.x
Machine parameters (FB, LS): CPU 8 vcores, RAM 16GB
Filebeat version: filebeat-8.10.3-1
.x86_64
Logstash version: logstash-8.10.3-1
.x86_64
1st test:
Got existing file segments of queue.disk
from a production machine: 24GB
in total, segment size 1MB
(24973 segment files in total
) and tried to consume all of them on a standalone Filebeat node with output to console:
filebeat.yml:
filebeat.inputs:
- type: filestream
id: my-filestream-id
paths:
- /tmp/*.log
output.console:
enabled: true
queue.disk:
max_size: 25GB
path: /var/lib/filebeat/data/queue
segment_size: 1MB
http.enabled: true
http.host: 127.0.0.1
Got the following results:
-
Filebeat queue files consumption rate:
60-64
queue.disk segment files/sec or60-64 MB/sec
-
Filebeat CPU consumption (max 800%: 8 vcores): ~
230%
-
Didn't notice any IOwaits
2nd test:
4 Filebeat machines that consume same set of queue.disk
segment files each with output to a standalone Logstash node. Logstash then outputs everything to /dev/null
.
Filebeat
filebeat.yml:
filebeat.inputs:
- type: filestream
id: my-filestream-id
paths:
- /tmp/*.log
output.logstash:
hosts: ["logs-ls4:5044"]
loadbalance: true
bulk_max_size : 8192
workers: 8
queue.disk:
max_size: 25GB
path: /var/lib/filebeat/data/queue
segment_size: 1MB
# read_ahead: 1024
http.enabled: true
http.host: 127.0.0.1
Logstash
logstash.yml:
path.data: /var/lib/logstash
pipeline.workers: 6
pipeline.batch.size: 128
path.config: /etc/logstash/conf.d
queue.type: persisted
queue.max_bytes: 310gb
dead_letter_queue.enable: true
dead_letter_queue.max_bytes: 2048mb
path.dead_letter_queue: /var/lib/logstash/dead_letter_queue
path.logs: /var/log/logstash
log.level: info
conf.d/10-io.conf:
input {
beats {
port => 5044
ssl => false
}
}
input {
beats {
port => 5045
ssl => false
}
}
output {
file {
path => "/dev/null"
}
}
Got the following results:
-
Filebeat queue files consumption rate (each FB node):
1 Filebeat nodes active:16-18
queue.disk segment files/sec or16-18 MB/sec
2 Filebeat nodes active:12
queue.disk segment files/sec or12 MB/sec
3 Filebeat nodes active:7-10
queue.disk segment files/sec or7-10 MB/sec
4 Filebeat nodes active:7-10
queue.disk segment files/sec or7-10 MB/sec
-
Filebeat CPU consumtion (max 800%: 8 vcores)
1 Filebeat nodes active:50-70%
2 Filebeat nodes active:30-50%
3 Filebeat nodes active:20-35%
4 Filebeat nodes active:20-35%
-
Logstash CPU consumption (max 800%: 8 vcores)
1 Filebeat nodes active: ~130%
2 Filebeat nodes active: ~190%
3 Filebeat nodes active: ~240%
4 Filebeat nodes active: ~280%
-
Didn't notice any IOwaits on both Filebeat and Logstash sides
I'm very concerned with the results of my tests - no matter how powerful FB/LS machines are:
-
All Filebeats read
disk.queue
files extremely slowly -16-18 MB/sec
Max FB->LS,60-64 MB/sec
FB standalone. -
Filebeats don't use all the resources of the machines like CPU even when output is just console and there is no network interaction with Logstash.
-
Logstash is also using just a fraction of CPU on the machine.
-
The more Filebeats are connected to the same Logstash, the worse Filebeat performance becomes (so obviously Logstash puts some back-pressure towards all Filebeats that send traffic, but why?)
Can you please help me understand if the results that I got are normal, and if not, where can the bottleneck be?
Thanks!