The earlier part of my config contains a lot of regex which breaks up a message
from /var/log/php7.0-fpm.log
into a number of parts and assigns [@metatadata][php_variety]
. Additional processing is then performed based on the results. The test data I'm using for this config (with some info redacted) is
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "NOTICE: PHP message: PHP Fatal error: Uncaught Exception: Test Exception in /var/www/..."
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "Stack trace:"
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "#0 /var/www/..."
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "#1 /var/www/..."
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "#2 /var/www/..."
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "#3 /var/www/..."
[25-Jul-2019 12:12:07] WARNING: [pool mypoolname] child 22377 said into stderr: "#4 /var/www/..."
I'm not using a multiline codec, so each of these lines is processed individually. The initial processing also assigns [@metadata][php_variety]
, which is later converted into an event field via
# add variety to events out
ruby {
code => "event.set('variety', event.get('[@metadata][php_variety]'))"
}
Therefore, when no aggregation is applied, these become:
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":" Uncaught Exception: Test Exception in /var/www/...","php_error":"Fatal","variety":"child_msg_php_error","severity":"NOTICE","@timestamp":"2019-07-25T20:06:03.544Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":"Stack trace:","variety":"child_msg","@timestamp":"2019-07-25T20:06:03.695Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":"#0 /var/www/...","variety":"child_msg","@timestamp":"2019-07-25T20:06:03.695Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":"#1 /var/www/...","variety":"child_msg","@timestamp":"2019-07-25T20:06:03.695Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":"#2 /var/www/...","variety":"child_msg","@timestamp":"2019-07-25T20:06:03.696Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":"#3 /var/www/...","variety":"child_msg","@timestamp":"2019-07-25T20:06:03.696Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
{"host":"oursite.domain","path":"/var/log/php7.0-fpm.log","message":"#4 /var/www/...","variety":"child_msg","@timestamp":"2019-07-25T20:06:03.696Z","timestamp":"25-Jul-2019 13:06:02","stream":"stderr","type":"php_error","client":"mypoolname","pid":"22696","php-fpm-level":"WARNING","@version":"1"}
The issue arises when attempting to concatenate these into a single message via the following rules:
# concat PHP stack traces - these only begin in events of variety child_msg_php_error
if [@metadata][php_variety] == "child_msg_php_error" {
if [php_error] == "Fatal" {
aggregate {
task_id => "%{pid}"
code => "map['php_fpm_php_stacktrace'] = event.get('message')"
map_action => "create"
}
drop {}
}
}
# php stack traces don't have an "end of stack trace" message, so we get to use timeouts instead
if [message] =~ "#\d .*" {
aggregate {
task_id => "%{pid}"
code => "map['php_fpm_php_stacktrace'] << '\n' + event.get('message')"
map_action => "update"
timeout => "1" # the whole stack trace should pretty much come through at once...
timeout_tags => ["_php_stacktrace_timeout"] # set a tag so we can indicate not to drop the event
timeout_task_id_field => "pid"
push_map_as_event_on_timeout => true
#timeout_code => "event.set('message', map['php_fpm_php_stacktrace'])"
}
drop event unless end of stacktrace tag is set
if "_php_stacktrace_timeout" not in [tags] {
drop {}
}
otherwise, set the meta stuff back to what it was for the original message
mutate {
update => { "[@metadata][php_variety]" => "child_msg_php_error" }
add_field => { "php_error" => "Fatal" }
}
}
As per the documentation, push_map_as_event_on_timeout
causes a new event by the name of php_fpm_php_stacktrace
to be generated.
However, this is also causing undesireable side effects. It appears to be destroying data which is not referenced from within the aggregate block at all. variety
is null
in the output file, and the following message appears in logstash's std:
[WARN ] 2019-07-25 15:16:48.509 [[main]>worker0] gelf - Trouble sending GELF event {:gelf_event=>{"short_message"=>nil, "full_message"=>"%{message}", "host"=>"%{host}", "_php_error"=>"Fatal", "_php_fpm_php_stacktrace"=>" Uncaught Exception: Test Exception in /var/www/...\\n#0 /var/www/...\\n#1 /var/www/...\\n#2 /var/www/\\n#3 /var/www/\\n#4 /var/www/...", "_tags"=>"_php_stacktrace_timeout", "_pid"=>"23630", "level"=>6}, :event=>#<LogStash::Event:0x2dbe6dcc>, :error=>#<ArgumentError: short_message is missing. Options version, short_message and host must be set.>}
I am certain that this error is due to the timeout event, as commenting out the drop {}
s for non-timed-out events allows every event except the final one to be sent and processed with no errors or problems whatsoever.