In production system we use filebeat 6.1 to manage docker log files with format like:
...
{"log":"2017-11-26T16:59:56.912-0000 - INFO - blahblah","stream":"stdout","time":"2017-11-26T16:59:56.91609507Z"}
...
This works great for "normal" lines but when there are multiline logs like the following:
{"log":"2017-11-26T16:59:56.912-0000 - ERROR - Error: Bad Request\n","stream":"stdout","time":"2017-11-26T16:59:56.91609507Z"}
{"log":" at xxxx \n","stream":"stdout","time":"2017-11-26T16:59:56.916118577Z"}
{"log":" at yyyy \n","stream":"stdout","time":"2017-11-26T16:59:56.916122447Z"}
the problem arise.
How could I merge the previous json rows in a single json row line to be processed with logstash?
Thanks Noéml for your very quick answer.
But... forgive me if I still don't understand...
If I use multiline (as you suggested) I suppose it will process those lines
and "merge" all of them as a single string message.
Basically from 3 messages
1 {"log":"2017-11-26T16:59:56.912-0000 - ERROR - Error: Bad Request\n","stream":"stdout","time":"2017-11-26T16:59:56.91609507Z"}
2 {"log":" at xxxx \n","stream":"stdout","time":"2017-11-26T16:59:56.916118577Z"}
3 {"log":" at yyyy \n","stream":"stdout","time":"2017-11-26T16:59:56.916122447Z"}
I obtain a single multiline message
1 {"log":"2017-11-26T16:59:56.912-0000 - ERROR - Error: Bad Request\n","stream":"stdout","time":"2017-11-26T16:59:56.91609507Z"}
{"log":" at xxxx \n","stream":"stdout","time":"2017-11-26T16:59:56.916118577Z"}
{"log":" at yyyy \n","stream":"stdout","time":"2017-11-26T16:59:56.916122447Z"}
That's a good staring point. But I'm missing how transform those lines in a single message like:
2017-11-26T16:59:56.912-0000 - ERROR - Error: Bad Request\n
at xxxx \n
at yyyy \n
to be processed by grok parser in logstash.
Basically how I can remove all the docker json stuff? The message is not anymore a json but a sting with 3 json attached...
Should I do that in logstash? How?
This can be done through Noémi suggest (multiline).
My problem is having multiline INSIDE docker json logs. Like this
{"log":"2017-11-26 - ERROR - blabla \n","stream":"stdout",...}
{"log":" at xxxx \n","stream":"stdout",...}
{"log":" at yyyy \n","stream":"stdout",...}
I could use a multiline pattern like:
({"log":"[[0-9]{4}-[0-9]{2}-[0-9]{2})
but this would result in a join of all json rows as a unique string that I cannot figure out how to process to obtain a final message like:
2017-11-26T16:59:56.912-0000 - ERROR - Error: Bad Request\n
at xxxx\n
at yyyy\n
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.