Hi all.
I am working with logstash and elasticsearch 5.2.2 version. I had the following pipeline:
input {
gelf {
codec => "json"
port => 12201
}
}
filter {
mutate {
gsub => ["timestamp", "." , "" ]
}
}
output {
elasticsearch {
hosts => ["http://10.248.44.91:9200","http://10.248.44.92:9200","http://10.248.44.93:9200","http://10.248.44.94:9200","http://10.248.44.95:9200","http://10.248.44.96:9200"]
index => "au_ms_bo_ops-%{+YYYY.MM.dd}"
}
}
to which messages are sent by a Graylog installation via UDP.
I noticed such kind of messages in the logstash log-file:
[2017-04-07T08:55:44,276][ERROR][logstash.inputs.gelf ] JSON parse failure. Falling back to plain-text {:error=>#<LogStash::Json::ParserError: Unrecognized token 'Exception': was expecting ('true', 'false' or 'null')
at [Source: Exception: Failed to decode data: invalid compressed data -- crc error; line: 1, column: 10]>, :data=>"Exception: Failed to decode data: invalid compressed data -- crc error"}
Since people administering the Graylog installation stated that under certain circumstances originated messages could be not well-formed, I managed to try to avoid (or, better, to minimize) the occurrence of the above kind of messages in the logstash log-file (since these occurrences caused alerts to be fired in the Prometheus monitoring system). In order to do this, I changed the pipeline in this way:
input {
gelf {
Notice that the default value of the gelf{} plug-in "codec" setting is "plain".
We do not more use "json" codec since we want to skip on invalid json without
warnings (see filter{} below).
codec => "plain"
port => 12201
}
}
filter {
mutate { add_field => { "[@metadata][filout]" => "${LS_IOL_FILOUT:0}" } }
json {
skip_on_invalid_json => true
Notice that the default value of the gelf{} plug-in "remap" setting is "true".
source => "event['message']"
}
if "_jsonparsefailure" not in [tags] {
mutate {
gsub => ["timestamp", "\." , "" ]
}
}
}
output {
if "_jsonparsefailure" in [tags] {
if [@metadata][filout] == "1" {
file {
path => "/store1/elk/run/wb/log/filout-${LS_USER_INSTANCE}.%{+YYYY-MM-dd}.txt"
codec => rubydebug {
metadata => true
}
}
}
} else {
elasticsearch {
hosts => ["http://10.248.44.91:9200","http://10.248.44.92:9200","http://10.248.44.93:9200","http://10.248.44.94:9200","http://10.248.44.95:9200","http://10.248.44.96:9200"]
index => "au_ms_bo_ops-%{+YYYY.MM.dd}"
}
}
}
I intended to rely upon the skip_on_invalid_json setting, whose manual page states “Allow to skip filter on invalid json (allows to handle json and non-json data without warnings)”.
However, in the logstash log-file still I see:
[2017-05-04T00:06:35,325][ERROR][logstash.inputs.gelf ] JSON parse failure. Falling back to plain-text {:error=>#<LogStash::Json::ParserError: Unrecognized token 'Exception': was expecting ('true', 'false' or 'null')
at [Source: Exception: Failed to decode data: invalid compressed data -- crc error; line: 1, column: 10]>, :data=>"Exception: Failed to decode data: invalid compressed data -- crc error"}
while in the companion output file (configured in the output{} stage) I see (e.g.):
{
"@timestamp" => 2017-05-04T00:05:54.795Z,
"source_host" => "10.248.44.245",
"@metadata" => {
"filout" => "1"
},
"@version" => "1",
"message" => "Exception: Failed to decode data: invalid compressed data -- crc error",
"tags" => [
[0] "_jsonparsefailure",
[1] "_fromjsonparser"
]
}
What is puzzling me is that the logstash message seems issued still by [logstash.inputs.gelf ] stage, as it would be still dealing with JSON codec.
Any suggestions is welcome.
Marco