I try to read an XML file with Logstash. But the XML is only read until the first \r. Shouldn't everything be read with Multiline? Or how can I exclude that message field?
XML File
<?xml version="1.0" encoding="utf-8"?><update xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><protocol><year>2018</year><number>S18085936</number><assignment>D18051009</assignment><pager>106</pager><timestamp>2018-06-18T00:21:54+02:00</timestamp></protocol><keyword>1</keyword><message>Lorem ipsum dolor sit amet,
consetetur sadipscing elitr,
sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat,
sed diam voluptua.</message><origin>HR K</origin><status><type>5</type><timestamp>2018-06-18T01:08:58+02:00</timestamp></status><object><type>Location</type><address><street>street</street><streetnumber>101</streetnumber><zip>8888</zip><city>City</city></address><name>street 10</name><coords><lat>47.00000</lat><lon>8.00000</lon></coords></object><object><type>Destination</type><name>HAUPTGEBÄUDE</name></object></update>
That does not look like valid XML, so I am not surprised the XML filter does not work. I would recommend you either correct the input data or parse the data as text.
I see that you updated the data and that it now looks like valid XML. Does the file contain a single XML document spread over multiple lines or can it contain more than one?
Files are always looking the same as you see in the "XML File" example, so yes, this is a single xml document with multiple lines. As you can see at there are CR inside the XML Tag.
Try changing your multiline pattern to <\?xml.* or <?xml (Don't remember if it takes regular expression or not). Afterwards, before your XML filter, use the mutate filter's gsub function to remove the \r carriage return.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.