Logstash flags XML header in log message as _jsonparsefailure?

I've narrowed one of my _jsonparsefailure to an XML header, shown below. The XML is "part" of the log that our application sends out.

2017-02-27 14:01:30,215 INFO  :SimpleThreadPoolWorker-4 [jdbc.sqlonly]  batching 1 statements:
1:  insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_ID, SEQUENCE_ORD) values ('<?xml version="1.0"?>
', '1000h6c14b2m', 0)

This is the JSON file that rsyslog sends out, which parses out correctly on online JSON parsers. I show it here in multiple lines for easy viewing, but it's really a single line event.

{
  "type":"trm-system",
  "host":"ip-10-22-2-188",
  "timestamp":"2017-03-08T20:19:01.234577+00:00",
  "@version":"1",
  "customer":"test",
  "role":"app1",
  "sourcefile":"/tmp/elk/trm-system.log",
  "message":"2017-02-27 14:01:30,215 INFO  :SimpleThreadPoolWorker-4 [jdbc.sqlonly]  batching 1 statements:\n1:  insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_ID, SEQUENCE_ORD) values ('xml version=\"1.0\"\n', '1000h6c14b2m', 0)"
}

I see this in logstash.stdout

{
       "message" => "{ \"type\":\"trm-system\", \"host\":\"ip-10-22-2-188\", \"timestamp\":\"2017-03-08T20:16:38.422895+00:00\", \"@version\":\"1\", \"customer\":\"test\", \"role\":\"app1\", \"sourcefile\":\"/tmp/elk/trm-system.log\", \"message\":\"2017-02-27 14:01:30,215 INFO  :SimpleThreadPoolWorker-4 [jdbc.sqlonly]  batching 1 statements:1:  insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_ID, SEQUENCE_ORD) values ('xml version=\"1.0\"', '1000h6c14b2m', 0)\"}\n",
      "@version" => "1",
    "@timestamp" => "2017-03-08T20:16:38.423Z",
          "host" => "0:0:0:0:0:0:0:1",
          "tags" => [
        [0] "_jsonparsefailure"
    ]
}

This is the logstash error I see in logstash.log

{:timestamp=>"2017-03-08T20:16:39.063000+0000", :message=>"Error parsing json", :source=>"message"
, :raw=>"{ \"type\":\"trm-system\", \"host\":\"ip-10-22-2-188\", \"timestamp\":\"2017-03-08T20:16:
38.422895+00:00\", \"@version\":\"1\", \"customer\":\"test\", \"role\":\"app1\", \"sourcefile\":\"
/tmp/elk/trm-system.log\", \"message\":\"2017-02-27 14:01:30,215 INFO  :SimpleThreadPoolWorker-4 [
jdbc.sqlonly]  batching 1 statements:1:  insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_
ID, SEQUENCE_ORD) values ('xml version=\"1.0\"', '1000h6c14b2m', 0)\"}\n", :exception=>#<LogStash:
:Json::ParserError: Unexpected character ('1' (code 49)): was expecting comma to separate OBJECT e
ntries
 at [Source: [B@2eb78005; line: 1, column: 394]>, :level=>:warn, :file=>"logstash/filters/json.rb"
, :line=>"76", :method=>"filter"}

Why is logstash flagging a valid JSON file as a _jsonparsefailure?

Could you paste configuration you are using ?

Regards,

@RodrigoMontoro thanks for the reply!

Logstash v2.4.1

input {
  udp {
    port => 5555
  }
}

filter {
  json {
    source => "message"
    target => "parsedJson"
  }
  if "_jsonparsefailure" not in [tags] {
    mutate {
      add_field => {
               "type" => "%{[parsedJson][type]}"
           "hostname" => "%{[parsedJson][host]}"
          "timestamp" => "%{[parsedJson][timestamp]}"
           "customer" => "%{[parsedJson][customer]}"
               "role" => "%{[parsedJson][role]}"
         "sourcefile" => "%{[parsedJson][sourcefile]}"
        "log_message" => "%{[parsedJson][message]}"
      }
    }
    if [type] == "trm-error" {
      grok {
          # Grok filters go here
      }
    }
    mutate {
      remove_field => ["parsedJson", "message"]
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

If logs are being send in this format

{
"type":"trm-system",
"host":"ip-10-22-2-188",
"timestamp":"2017-03-08T20:19:01.234577+00:00",
"@version":"1",
"customer":"test",
"role":"app1",
"sourcefile":"/tmp/elk/trm-system.log",
"message":"2017-02-27 14:01:30,215 INFO :SimpleThreadPoolWorker-4 [jdbc.sqlonly] batching 1 statements:\n1: insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_ID, SEQUENCE_ORD) values ('xml version="1.0"\n', '1000h6c14b2m', 0)"
}

You probably need multiline before json filter.

I can't test something now but config like this bellow would help

multiline {
pattern => "}"
negate => "true"
what => "next"
}

json {
source => "message"
target => "parsedJson"
}

Could you test ?

Regards

Sorry, I just showed log event as multiple lines for easy viewing. It's really just one single line, like this. I updated my OP.

{"type":"trm-system", "host":"ip-10-22-2-188", "timestamp":"2017-03-08T20:19:01.234577+00:00", "@version":"1", "customer":"test", "role":"app1", "sourcefile":"/tmp/elk/trm-system.log", "message":"2017-02-27 14:01:30,215 INFO :SimpleThreadPoolWorker-4 [jdbc.sqlonly] batching 1 statements:\n1: insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_ID, SEQUENCE_ORD) values ('xml version=\"1.0\"\n', '1000h6c14b2m', 0)"}

It's suppose to work, tested here quickly.

{
"hostname" => "ip-10-22-2-188",
"@timestamp" => 2017-03-09T15:10:35.656Z,
"role" => "app1",
"@version" => "1",
"host" => "LABS",
"log_message" => "2017-02-27 14:01:30,215 INFO :SimpleThreadPoolWorker-4 [jdbc.sqlonly] batching 1 statements:\n1: insert into CR_TEXT_STORAGE_DATA (TEXT_VAL, TEXT_STORAGE_ID, SEQUENCE_ORD) values ('xml version="1.0"\n', '1000h6c14b2m', 0)",
"type" => "trm-system",
"timestamp" => "2017-03-08T20:19:01.234577+00:00",
"customer" => "test",
"sourcefile" => "/tmp/elk/trm-system.log"
}

What I think it is happening, you need to remove "syslog" part (grok) and so use json filter from there. Could you remove filter and just output raw event ? Because reading file direct seems fine but it's different from something sent by a syslog server.

Regards,

Regards,

??? I don't have a syslog filter, only a json filter. What do you mean by "syslog part (grok)"? Also, did you test in v2.4.1 or v5.2?