Hello,
I'm doing a little bit of performance testing and optimizations with my current setup and I'm seeing weird performance drops with mutate and date filter.
My setup:
- Logstash node with 6 CPUs (VM) and I run logstash with -w 4 parameter
- Logstash 2.1.1
- file input where I feed in a combined one day log file from various services that I send to logstash (cca 2,9GB)
- null output plugin
- metrics plugin for measuring performance
- different filters after metric plugin
Performance I see:
- input, metric filter, output: cca 42000 events/s
- input, metric filter, grok filter, output: cca 41000 events/s
- input, metric filter, grok filter, mutate filter, output: cca 20000 events/s
- input, metric filter, grok filter, date filter, output: cca 15000 events/s
Mutate filter is renaming one field and removing one, and date filter is only updating timestamp field.
Is such a drop in performance normal? I would expect grok to be the plugin where I would lost the most performance, not mutate or date filters.
When testing, all 4 filter workers use 100% CPU.
Config file:
00-input:
input {
file {
path => "/root/logstash-performance/logstash-test-log.1"
start_position => "beginning"
sincedb_path => "/dev/null"
add_field => { "syslog_format" => "true" }
}
}
05-metrics:
filter {
metrics {
meter => "events"
add_tag => "metric"
add_field => { "metric_type" => "Input" }
flush_interval => 10
}
}
10-filter:
filter {
if [syslog_format] == "true" {
grok {
patterns_dir => [ '/etc/logstash/patterns' ]
match => [ 'message', '%{SYSLOGLINE}']
overwrite => [ 'message' ]
add_tag => [ '_grok_syslog_prefilter_success' ]
tag_on_failure =>
}
mutate {
rename => [ "logsource", "host" ]
remove_field => [ "syslog_format" ]
}
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
99-output:
output {
if "metric" in [tags] {
stdout {
codec => line {
format => "%{[metric_type]}: 1m rate: %{[events][rate_1m]} "
}
}
}
null {}
}
Thanks, Matej