Unable to avoid JSON parsing errors in logstash log-file

Hi all.
I am working with logstash and elasticsearch 5.2.2 version. I had the following pipeline:

input {
gelf {
codec => "json"
port => 12201
}
}

filter {
mutate {
gsub => ["timestamp", "." , "" ]
}
}

output {
elasticsearch {
hosts => ["http://10.248.44.91:9200","http://10.248.44.92:9200","http://10.248.44.93:9200","http://10.248.44.94:9200","http://10.248.44.95:9200","http://10.248.44.96:9200"]
index => "au_ms_bo_ops-%{+YYYY.MM.dd}"
}
}

to which messages are sent by a Graylog installation via UDP.
I noticed such kind of messages in the logstash log-file:

[2017-04-07T08:55:44,276][ERROR][logstash.inputs.gelf ] JSON parse failure. Falling back to plain-text {:error=>#<LogStash::Json::ParserError: Unrecognized token 'Exception': was expecting ('true', 'false' or 'null')
at [Source: Exception: Failed to decode data: invalid compressed data -- crc error; line: 1, column: 10]>, :data=>"Exception: Failed to decode data: invalid compressed data -- crc error"}

Since people administering the Graylog installation stated that under certain circumstances originated messages could be not well-formed, I managed to try to avoid (or, better, to minimize) the occurrence of the above kind of messages in the logstash log-file (since these occurrences caused alerts to be fired in the Prometheus monitoring system). In order to do this, I changed the pipeline in this way:

input {
gelf {

Notice that the default value of the gelf{} plug-in "codec" setting is "plain".

We do not more use "json" codec since we want to skip on invalid json without

warnings (see filter{} below).

    codec => "plain"
    port => 12201
}

}

filter {
mutate { add_field => { "[@metadata][filout]" => "${LS_IOL_FILOUT:0}" } }
json {
skip_on_invalid_json => true

Notice that the default value of the gelf{} plug-in "remap" setting is "true".

    source => "event['message']"
}
if "_jsonparsefailure" not in [tags] {
    mutate {
        gsub  => ["timestamp", "\." , "" ]
    }
}

}

output {
if "_jsonparsefailure" in [tags] {
if [@metadata][filout] == "1" {
file {
path => "/store1/elk/run/wb/log/filout-${LS_USER_INSTANCE}.%{+YYYY-MM-dd}.txt"
codec => rubydebug {
metadata => true
}
}
}
} else {
elasticsearch {
hosts => ["http://10.248.44.91:9200","http://10.248.44.92:9200","http://10.248.44.93:9200","http://10.248.44.94:9200","http://10.248.44.95:9200","http://10.248.44.96:9200"]
index => "au_ms_bo_ops-%{+YYYY.MM.dd}"
}
}
}

I intended to rely upon the skip_on_invalid_json setting, whose manual page states “Allow to skip filter on invalid json (allows to handle json and non-json data without warnings)”.
However, in the logstash log-file still I see:

[2017-05-04T00:06:35,325][ERROR][logstash.inputs.gelf ] JSON parse failure. Falling back to plain-text {:error=>#<LogStash::Json::ParserError: Unrecognized token 'Exception': was expecting ('true', 'false' or 'null')
at [Source: Exception: Failed to decode data: invalid compressed data -- crc error; line: 1, column: 10]>, :data=>"Exception: Failed to decode data: invalid compressed data -- crc error"}

while in the companion output file (configured in the output{} stage) I see (e.g.):

{
"@timestamp" => 2017-05-04T00:05:54.795Z,
"source_host" => "10.248.44.245",
"@metadata" => {
"filout" => "1"
},
"@version" => "1",
"message" => "Exception: Failed to decode data: invalid compressed data -- crc error",
"tags" => [
[0] "_jsonparsefailure",
[1] "_fromjsonparser"
]
}

What is puzzling me is that the logstash message seems issued still by [logstash.inputs.gelf ] stage, as it would be still dealing with JSON codec.
Any suggestions is welcome.
Marco

I would use the json filter instead. You should add a conditional check to see if the message field "looks" like JSON.

if [message] =~ "\A\{.+\}\z" {
  json { .. }
}

or similar.
Be aware that the JSON might be an array of objects [{},{}]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.