Extracting multiline xml fields from a log

Hello

I'm struggling with a problem trying to extract indepent fields from a log which contains a XML embebbed.
I'm using logstash for the ingesting into elasticsearch. First, I collect the logs with filebeat and send it to kafka (architecture reasons). So, logstash conf file is read from kafka and ingest into elastic.

The format of my message is similar to this:

2019-12-06 14:34:13,620 hostname: [com.ibm.mq.jmqi.remote.impl.RemoteSession[:/13514][connectionId=C328C3D8]] INFO - >>message: { some data in JSON}
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns="urn:iso:std:iso:20022"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="urn:iso:std:iso:20022">
   <FIToFICstmrCdtTrf>
      <GrpHdr>
         <MsgId>Fe9275f33b63794dea4</MsgId>
....some other fields...
</Document>

2019-12-06 14:34:13,620 hostname:*** another log line

So, the idea is to ingest the log into elasticsearch but extract every single node in xml in sepparated fields.

I've tryed with a grok filter to identified the message (this specific structure) and then parse with the xml filter but I haven't succeeded. One of the problems is that I've lines in the logs that are not equal as the showed, there are also lines without the xml.. so I've only had to match this concret log line with the xml inside.

Any adviced or help?

Thanks in advance

Assuming you are using multiline to combine everything between the two dates with the preceding date you can use an xml filter. It will ignore all the noise preceding the <?xml provide you use 'store_xml => true' and not xpath.