Hi all,
I am currently trying to apply the multiline codec (with particular regards to Java stack traces) in the input tcp stream in logstash on events (logs) that are provided to logstash by the logspout-logstash tool (the latter providing logs from docker containers). The tricky thing is that the input that is provided to logstash by logspout is already in JSON format and at the same time contains multiline logs that would have to be united by multiline in logstash.
What I normally would have to do in order to handle this situation in logstash (as per my understanding of logstash) is to process the input from logspout first by the JSON codec in logstash in order to get the JSON input translated into corresponding fields in logstash and then apply the multiline codec to the result in order to merge the multiple log lines that belong to one logical log event into one logstash event.
However, as far as I can tell, it is not possible to apply two codecs in the input tcp stream of logstash. Therefore, I can only apply either "codec json" or "codec multiline" within the input stream. And since multiline is no longer available in the logstash filter section, it has to be placed in the input stream if it needs to be used. Therefore I cannot apply the json codec in the input stream.
If I try to apply the json filter in the filter section after applying the multiline codec in the input section, the json filter is unable to process the input of the events that have been multilined ("Error parsing json").
This is an example of the input logstash gets from the logspout-logstash tool (a single log line):
2018-09-11T07:16:23.792Z 192.168.0.3 {"Environment":"int","Instance":"myApp-core-abcd","docker":{"name":"/myapp","id":"73237eb27d452c6db1e1eaaf07","image":"192.168.0.138:6000/myapp-int:latest","hostname":"7d45","labels":{"Environment":"int","Instance":"myApp-core-abcd","build-date":"20180311","license":"GPLv2","name":"CentOS Base Image","vendor":"CentOS"}},"message":" DEBUG [20180911 09:16:23] - 83787 SessionMgrBean.getConnectionProfileForClientType started","stream":"stdout","tags":[]}
(Except for the timestamp/date at the beginning of that event(?), the real content (the one that contains the actual log output from the application and which does not remain constant throughout the various events) is the one that is introduced by the name "message":"[actual log content]".)
This is my current, small logstash config for debugging purposes:
{
tcp
{
port => 5000
type => "backend"
codec => multiline
{
pattern => "^.*? %{LOGLEVEL} +\["
negate => true
what => "previous"
} #codec
} #tcp
} #input
And this is what the multiline codec produces for multiline log lines that logstash receives from logspout which match the multiline pattern in the logstash config (rubydebug-formated):
"@version" => "1",
"host" => "192.168.0.3",
"message" => "{\"Environment\":\"int\",\"Instance\":\"myApp-core-abcd\",\"docker\":{\"name\":\"/myapp\",\"id\":\"cc3032\",\"image\":\"192.168.0.138:6000/myapp-int:latest\",\"hostname\":\"7d45\",\"labels\":{\"Environment\":\"int\",\"Instance\":\"myApp-core-abcd\",\"build-date\":\"20180311\",\"license\":\"GPLv2\",\"name\":\"CentOS Base Image\",\"vendor\":\"CentOS\"}},\"message\":\" INFO [20180913 15:25:50] - 50511 Blah.getData: executing SELECT yadada blah blah and so on fictional example for the actual log contentXYZ\",\"stream\":\"stdout\",\"tags\":[]}\n{\"Environment\":\"int\",\"Instance\":\"myApp-core-abcd\",\"docker\":{\"name\":\"/myapp\",\"id\":\"cc3032\",\"image\":\" 192.168.0.138:6000/myapp-int:latest\",\"hostname\":\"7d45\",\"labels\":{\"Environment\":\"int\",\"Instance\":\"myApp-core-abcd\",\"build-date\":\"20180311\",\"license\":\"GPLv2\",\"name\":\"CentOS Base Image\",\"vendor\":\"CentOS\"}},\"message\":\"\\tFROM abcdexception abcdexception0\",\"stream\":\"stdout\",\"tags\":[]}\n{\"Environment\":\"int\",\"Instance\":\"myApp-core-abcd\",\"docker\":{\"name\":\"/myapp\",\"id\":\"cc3032\",\"image\":\" 192.168.0.138:6000/myapp-int:latest\",\"hostname\":\"7d45\",\"labels\":{\"Environment\":\"int\",\"Instance\":\"myApp-core-abcd\",\"build-date\":\"20180311\",\"license\":\"GPLv2\",\"name\":\"CentOS Base Image\",\"vendor\":\"CentOS\"}},\"message\":\"\\tWHERE abcdexception0.abcdexception_id = ?\",\"stream\":\"stdout\",\"tags\":[]}",
"port" => 45754,
"@timestamp" => 2018-09-13T13:25:50.569Z,
"tags" => [
[0] "multiline"
],
"type" => "backend"
}
(And this multiline is just a very small example of a multiline. Normally, the multiline event is much bigger when a stack trace appears in the log.)
Assuming that I cannot change the format of the log input that logstash receives from logspout, how should I best handle that json-formated input from logspout in order to have a clean json mapping in logstash AND do multiline as well in logstash?
Thanks,
Kaspar