[solved] Logstash-Forwarder, json codec and multiline issue

Hey,

I'm seeing a lot of issues with certain specific multiline log entries.

Here's a log entry:

D, [2015-10-07T16:11:53.434110 #20457] DEBUG -- : Error occurred while parsing request parameters.
Contents:

{
"payload": {
    "is_ingetrokken": true,
     }
}

These loglines are forwarded via Logstash-Forwarder to Logstash where this multiline filter should, well, squash the multilines

multiline {
    pattern => "^[DFEWI],\s"
    negate => true
    what => "previous"
}

When I go and look, this is what get's put in to ES:

{
  "_index": "logstash-2015.10.07",
  "_type": "coosy",
  "_id": "AVBCsK05GGU6JV8bpMNR",
  "_score": null,
  "_source": {
    "message": [
      "Error occurred while parsing request parameters.\nContents:\n\n{"
    ],
    "tags": [
      "multiline",
      "other"
    ],
    "@version": "1",
    "@timestamp": "2015-10-07T14:11:53.434Z",
    "file": "/logs/coosy/production.log",
    "host": "482a4b207db0",
    "offset": [
      "488372224",
      "488372323",
      "488372333",
      "488372334"
    ],
    "type": "coosy",
    "deployment": "production",
    "pid": 20457,
    "loglevel": "DEBUG"
  },
  "fields": {
    "@timestamp": [
      1444227113434
    ]
  },
  "sort": [
    1444227113434
  ]
}

As you can see, half of the multiline message does not get added to ES. Futhermore, when this happens, Logstash will generally hang and I have to reboot it (and Logstash Forwarder) to get back on track - at least untill the next occurence of such a log entry.

{:timestamp=>"2015-10-08T09:42:29.569000+0200", message=>"Lumberjack input: unhandled exception", :exception=>#<IndexError: string not matched>, :backtrace=>["org/jruby/RubyString.java:3912:in `[]='", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/event.rb:73:in `initialize'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.1/lib/logstash/codecs/json.rb:46:in `decode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.5/lib/logstash/inputs/lumberjack.rb:77:in `run'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.5/lib/logstash/inputs/lumberjack.rb:105:in `invoke'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:264:in `data'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:246:in `read_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:190:in `data_field_value'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:101:in `feed'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:206:in `compressed_payload'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:101:in `feed'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:239:in `read_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:224:in `run'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.5/lib/logstash/inputs/lumberjack.rb:104:in `invoke'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.9.1-java/lib/concurrent/executor/executor_service.rb:515:in `run'", "Concurrent$$JavaExecutorService$$Job_1581672782.gen:13:in `run'"], :level=>:error}

Any ideas? Need more info?

Thank you for your help!
Jeroen

As a test I created a logfile with just the one log entry mentioned above.
I did change the Logstash pipeline to not delete the original message and output to stdout. As you can see, multiline fails for this message:

This is the output:

Lumberjack input: unhandled exception {:exception=>#<IndexError: string not matched>, :backtrace=>["org/jruby/RubyString.java:3912:in `[]='", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.4-java/lib/logstash/event.rb:73:in `initialize'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-codec-json-1.0.1/lib/logstash/codecs/json.rb:46:in `decode'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.5/lib/logstash/inputs/lumberjack.rb:77:in `run'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.5/lib/logstash/inputs/lumberjack.rb:105:in `invoke'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:264:in `data'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:246:in `read_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:190:in `data_field_value'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:101:in `feed'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:206:in `compressed_payload'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:101:in `feed'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:239:in `read_socket'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.24/lib/lumberjack/server.rb:224:in `run'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-lumberjack-1.0.5/lib/logstash/inputs/lumberjack.rb:104:in `invoke'", "org/jruby/RubyProc.java:271:in `call'", "/opt/logstash/vendor/bundle/jruby/1.9/gems/concurrent-ruby-0.9.1-java/lib/concurrent/executor/executor_service.rb:515:in `run'", "Concurrent$$JavaExecutorService$$Job_1581672665.gen:13:in `run'"], :level=>:error}
{
   "message" => [
    [0] "D, [2015-10-07T16:12:00.210136 #20457] DEBUG -- : Error occurred while parsing request parameters.\nContents:\n{",
    [1] "Error occurred while parsing request parameters.\nContents:\n{"
],
      "tags" => [
    [0] "multiline",
    [1] "other"
],
  "@version" => "1",
"@timestamp" => "2015-10-07T14:12:00.210Z",
      "file" => "/logs/coosy/production.log",
      "host" => "78345eef1391",
    "offset" => [
    [0] "0",
    [1] "99",
    [2] "109"
],
      "type" => "coosy",
"deployment" => "production",
       "pid" => 20457,
  "loglevel" => "DEBUG",
 "@metadata" => {
    "timestamp" => "2015-10-07T16:12:00.210136"
}
}

Which gets repeated ad infinitum (and when I output to ES, fills up ES.

I'm running Logstash-Forwarder 0.4.0
Logstash 1.5.4 with following plugins:
logstash-filter-multiline (1.0.0)
Logstash-input-lumberjack (1.0.5)

updating logstash-filter-multiline to 2.0.0 did not resolve this issue.

It looks like the json codec is doing this.
When I disable it (which I can't in production), multiline works fine and the log in parsed as it should.

input {
  lumberjack {
    port => 6782
    ssl_certificate => "/etc/certs/logstash.crt"
    ssl_key => "/etc/certs/logstash-server.key"
    # json codec is needed for apache logs, if !json will fall back to strings
    # codec => "json"
  }
}

Here's how the same log entry gets parsed without the json codec first trying to do it's thing.

{
   "message" => [
    [0] "D, [2015-10-07T16:12:00.210136 #20457] DEBUG -- : Error occurred while parsing request parameters.\nContents:\n{\n    \"payload\": {\n        \"is_ingetrokken\": true,\n         }\n}",
    [1] "Error occurred while parsing request parameters.\nContents:\n{\n    \"payload\": {\n        \"is_ingetrokken\": true,\n         }\n}"
],
  "@version" => "1",
"@timestamp" => "2015-10-07T14:12:00.210Z",
      "file" => "/logs/coosy/production.log",
      "host" => "c1393343b12a",
    "offset" => [
    [0] "0",
    [1] "99",
    [2] "109",
    [3] "111",
    [4] "128",
    [5] "160",
    [6] "171"
],
      "type" => "coosy",
"deployment" => "production",
      "tags" => [
    [0] "multiline",
    [1] "other"
],
       "pid" => 20457,
  "loglevel" => "DEBUG",
 "@metadata" => {
    "timestamp" => "2015-10-07T16:12:00.210136"
}
}

Any ideas on how to fix this (keeping one logstash instance)?

As your log lines are not all valid JSON, using a JSON codec will not work. Have you tried instead using multiline codec to assemble the log records and then process them using grok and the json filter?

If you have different sources with different formats, maybe you could define multiple inputs, allowing you to apply a different codec for each input?

Yes! I just made the same realisation.
I've removed the json codec and use the json filter where needed.

Thank you!