Logstash reads beyond "message" field when using GREEDYDATA

Hi,

I recently stumbled upon this issue with GREEDYDATA in Logstash grok filter, and I'm not sure if I'm doing something wrong or if it's a general bug. It appears that when a grok match pattern ends with %{GREEDYDATA}, grok gets a little too greedy and reads beyond the "message" field itself, thus also reading the metadata that comes with the event. The logs are being shipped from end nodes via Filebeat, and reaching Logstash via Kafka topics. Here's a reproducible test case using stdin.

logstash-message-overflow-test.conf

# stdin input
input {
    stdin {
        type => "stdin-type"
    }
}

# Notice that the grok pattern ends with %{GREEDYDATA:log_message}. 
# named_captures_only is used to select necessary fields only
filter {
    grok {
        match => ["message", "%{TIMESTAMP_ISO8601:log_timestamp}\s%{LOGLEVEL:log_level}\s\[%{WORD:log_type}\]\s(?<log_jsource>(%{JAVACLASS}\:%{NUMBER}))\s%{GREEDYDATA:log_message}"]
        named_captures_only => true
    }

}

# output to stdout
output {
    stdout { codec => rubydebug } 
}

Test input, originally received from Filebeat via Kafka

{"@timestamp":"2017-10-26T11:38:38.983Z","beat":{"hostname":"ip-xxx-xxx-xxx-xxx","name":"ip-xxx-xxx-xxx-xxx","version":"5.6.2"},"fields":{"log_src":"server_error_logs"},input_type":"log","message":"2017-10-26 11:38:38:662 ERROR [ErrorLog] Test.java:1 Server HTTP response code : 502","offset":1666303,"source":"/log/server/error/error.log","type":"log"}

Logstash output

D:\Logman\Kafka\POC>%LOGSTASH_HOME%\bin\logstash -f config\logstash-message-overflow-test.conf
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
Sending Logstash's logs to D:/ELK/logstash-5.5.2/logs which is now configured via log4j2.properties
[2017-10-28T10:57:13,531][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://logstash_system:xxxxx
x@localhost:9200/]}}
[2017-10-28T10:57:13,535][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http:/
/logstash_system:xxxxxx@localhost:9200/, :path=>"/"}
[2017-10-28T10:57:13,719][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://logstash_system:xxxxxx@localhost:9200/"}
[2017-10-28T10:57:13,720][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhos
t:9200"]}
[2017-10-28T10:57:13,721][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>".monitoring-logstash", "pipeline.workers"=>1, "pipeline.batch.size"=>2, "p
ipeline.batch.delay"=>5, "pipeline.max_inflight"=>2}
[2017-10-28T10:57:13,723][INFO ][logstash.pipeline        ] Pipeline .monitoring-logstash started
[2017-10-28T10:57:13,792][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.
delay"=>5, "pipeline.max_inflight"=>1000}
[2017-10-28T10:57:13,813][INFO ][logstash.pipeline        ] Pipeline main started
The stdin plugin is now waiting for input:
[2017-10-28T10:57:13,894][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{"@timestamp":"2017-10-26T11:38:38.983Z","beat":{"hostname":"ip-xxx-xxx-xxx-xxx","name":"ip-xxx-xxx-xxx-xxx","version":"5.6.2"},"fields":{"log_src":"server_erro
r_logs"},input_type":"log","message":"2017-10-26 11:38:38:662 ERROR [ErrorLog] Test.java:1 Server HTTP response code : 502","offset":1666303,"source":"/log/serv
er/error/error.log","type":"log"}
{
    "log_timestamp" => "2017-10-26 11:38:38:662",
         "log_type" => "ErrorLog",
       "@timestamp" => 2017-10-28T05:27:17.008Z,
         "@version" => "1",
             "host" => "DO-SIDDHANT-S01",
        "log_level" => "ERROR",
      "log_jsource" => "Test.java:1",
      "log_message" => "Server HTTP response code : 502\",\"offset\":1666303,\"source\":\"/log/server/error/error.log\",\"type\":\"log\"}\r",
          "message" => "{\"@timestamp\":\"2017-10-26T11:38:38.983Z\",\"beat\":{\"hostname\":\"ip-xxx-xxx-xxx-xxx\",\"name\":\"ip-xxx-xxx-xxx-xxx\",\"version\":\
"5.6.2\"},\"fields\":{\"log_src\":\"server_error_logs\"},input_type\":\"log\",\"message\":\"2017-10-26 11:38:38:662 ERROR [ErrorLog] Test.java:1 Server HTTP res
ponse code : 502\",\"offset\":1666303,\"source\":\"/log/server/error/error.log\",\"type\":\"log\"}\r",
             "type" => "stdin-type"
}

Particularly, notice that log_message contains "Server HTTP response code : 502\",\"offset\":1666303,\"source\":\"/log/server/error/error.log\",\"type\":\"log\"}\r". Ideally, it should only be "Server HTTP response code : 502".

Comments?

Sending json via stdin is not the same as receiving this data via the beats input, as this processes the json in the event. Try adding a json codec to the stdin input.

Thanks for the pointer. Adding codec => json to stdin and the Kafka input solves the issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.