Hi all,
I've been using logstash version 7.17.8 in docker, and for a while now trying to minimize its memory usage and swap usage, without much success.
I have 2 file inputs and one gelf input, and 2 small filters.
I tried to minimize the config (used max_open_files=1
cause I expect only 1 file in each input),
input {
gelf {
host => "0.0.0.0"
port_udp => 12201
}
file {
path => "path1"
add_field => { "component" => "component1" }
add_field => { "priority" => "HIGH" }
max_open_files => 1
# Any line not starting with a timestamp should be merged with the previous line
codec => multiline {
# Regex for YYYY-MM-dd HH-mm-ss,SSS
pattern => "^([0-9]{4})-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1]) (2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9]),([0-9]{3})"
negate => true
what => "previous"
}
}
file {
path => "path2"
add_field => { "component" => "component2" }
add_field => { "priority" => "HIGH" }
max_open_files => 1
# Any line not starting with a timestamp should be merged with the previous line
codec => multiline {
# Regex for YYYY-MM-dd HH-mm-ss,SSS
pattern => "^([0-9]{4})-(0[1-9]|1[0-2])-(0[1-9]|[1-2][0-9]|3[0-1]) (2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9]),([0-9]{3})"
negate => true
what => "previous"
}
}
}
filter {
if [component] == "component1" or [component] == "component2" {
dissect {
mapping => {
"message" => "%{timestamp} %{+timestamp} [%{pid}] [%{module}] %{level} - %{message}"
}
}
} else {
if [level] == "ERROR" and "java.sql.SQLException" in [StackTrace] {
mutate {
gsub => ["field1", "pattern1", "pattern2"]
}
}
}
}
logstash.yml:
api.enabled: true
http.host: 0.0.0.0
pipeline.batch.delay: 1000
pipeline.batch.size: 30
pipeline.workers: 1
xpack.monitoring.enabled: false
The GC used is CMS
.
After a lot of trial and error, I have limited the container to 512mb memory, and the heap to 256mb (Xmx and Xms).
The container uses up the entire 512mb right away (in java profiler I saw that heap is taking 250mb and non-heap around 200mb), but the main problem is that its using swap memory.
I've ran it for a few hours with constant input messages (10 messages every 3 sec) and the swap usage was 500mb, and went up to 1gb+.
The messages I sent are through the gelf input. They don't go into the dissect
or gsub
filters - so I don't think that's the reason.
I have a custom java output which forwards the messages to a 3rd party. I thought that was the problem so I replaced it with stdout
. Same issue.
Another strange thing I saw is that when checking the node stats API, I saw 101 file descriptors open, even though I only have 2 input files with max_open_files=1
.
- How do I decrease the memory usage? I read a lot of posts about it and tried many things but problem persists.
- What's the reason for the 100 file descriptors?