Date and mutate filter performance

Hello,

I'm doing a little bit of performance testing and optimizations with my current setup and I'm seeing weird performance drops with mutate and date filter.

My setup:

  • Logstash node with 6 CPUs (VM) and I run logstash with -w 4 parameter
  • Logstash 2.1.1
  • file input where I feed in a combined one day log file from various services that I send to logstash (cca 2,9GB)
  • null output plugin
  • metrics plugin for measuring performance
  • different filters after metric plugin

Performance I see:

  • input, metric filter, output: cca 42000 events/s
  • input, metric filter, grok filter, output: cca 41000 events/s
  • input, metric filter, grok filter, mutate filter, output: cca 20000 events/s
  • input, metric filter, grok filter, date filter, output: cca 15000 events/s

Mutate filter is renaming one field and removing one, and date filter is only updating timestamp field.

Is such a drop in performance normal? I would expect grok to be the plugin where I would lost the most performance, not mutate or date filters.

When testing, all 4 filter workers use 100% CPU.

Config file:

00-input:

input {
file {
path => "/root/logstash-performance/logstash-test-log.1"
start_position => "beginning"
sincedb_path => "/dev/null"
add_field => { "syslog_format" => "true" }
}
}

05-metrics:

filter {
metrics {
meter => "events"
add_tag => "metric"
add_field => { "metric_type" => "Input" }
flush_interval => 10
}
}

10-filter:

filter {
if [syslog_format] == "true" {
grok {
patterns_dir => [ '/etc/logstash/patterns' ]
match => [ 'message', '%{SYSLOGLINE}']
overwrite => [ 'message' ]
add_tag => [ '_grok_syslog_prefilter_success' ]
tag_on_failure =>
}
mutate {
rename => [ "logsource", "host" ]
remove_field => [ "syslog_format" ]
}
date {
match => [ "timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}

99-output:

output {
if "metric" in [tags] {
stdout {
codec => line {
format => "%{[metric_type]}: 1m rate: %{[events][rate_1m]} "
}
}
}
null {}
}

Thanks, Matej