I am trying to parse SOAP messages using an XML filter. A SOAP message looks like this (all on one line)
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Header><h:MyHeader xmlns:h="http://com.example/foo" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://com.example/foo" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<langId>en_US</langId>
</h:MyHeader>
</s:Header>
<s:Body xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<submit xmlns="http://com.example/bar"><sType><pRequest>
<iData xsi:type="q1:pData" xmlns="http://com.example/c/v1" xmlns:q1="http://com.example/bar">
<q1:document xsi:type="q2:ABC" xmlns:q2="http://com.example/p/v1">
<q2:pAmt>12345.00</q2:pAmt>
</q1:document></iData></pRequest></sType></submit></s:Body></s:Envelope>
I can parse most of the fields I want, I have an issue with getting the xsi:type out of the body. If I set remove_namespaces to be false then using
xpath => { "/s:Envelope/s:Body//q1:document/@xsi:type" => "type" }
I do not get type in the resulting message, nor do I get an error message (yes, all 5 namespaces are supplied to the xml filter). I think the syntax is correct because the PowerShell / .NET xpath implementation executes it and returns the expected attribute.
As a workaround I can set remove_namespaces to be true and extract "/Envelope/Body//document/@xsi:type". It is non-obvious why the xsi prefix is required when remove_namespaces is true.