Parsing xml in logstash [0] "_xmlparsefailure"

Issue: [0] "_xmlparsefailure" ..Hi all need your help on below issue.. Pleaes refer to step by step explanation of the issue.

Step 1 : I have sample input as per below.

<?xml version="1.0" encoding="UTF-8"?> blue yellow green red

input

STEP 2: My config is as per below.
input {
file{
path => "C:/Data/xyz/*"
start_position => "beginning"
sincedb_path => "NUL"
codec => multiline {
pattern => "<?xml"
negate => true
what => "previous"
}
}
}
filter {
xml{
store_xml => false
source => "message"
xpath =>[
"/DataSet/cbc:Field1/text()", "parsedField1",
"/DataSet/cbc:Field2/text()", "parsedField2",
"/DataSet/cbc:Field3/text()", "parsedField3",
"/DataSet/cbc:Field4/text()", "parsedField4"
]
}
}

output {
elasticsearch {
index => "idx_xml"
hosts => "localhost:9200"
}
stdout {
codec => rubydebug
}
}

STEP 3 - i start logstash: then initialli get this error.


in KIBANA it shows this this

STEP 4 - THEN Heres the interesting part.. I will hit CTRL-C to cancel the logstash .. HERE ALL OF A SUDDEN IT STArted to parsed correctly

STEP 5 - So I checked KIBANA see how it looked. and YES, This is how i expect it to look.

Questions:
Given the steps i demonstrated..

  1. WHy is it not parsing?
  2. HOw it it started parsing jsut right after i tried to terminated logstash.
  3. how to fix please, has anyone observed this same behavior?

Please do not post images of text. Just post the text. To avoid the XML being consumed as markup select the XML and click on </> in the toolbar above the edit pane. Use the preview panel to confirm that

Foo

changes to

<Field>Foo</Field>

You have configured the multiline codec to combine any lines that do not start with <?xml" with the previous line. It is waiting for another line that does start with that in order to flush the event. It flushes the "incomplete" event at shutdown.

You can use the auto_flush_interval option to cause it to flush earlier.

awesome sauce! thanks badger. you always save the day. yes that did the trick. and noted on the images i wont do it again..

now im able to achive to log:
|message|| cbc:Field1blue</cbc:Field1> cbc:Field2yellow</cbc:Field2> cbc:Field3green</cbc:Field3> cbc:Field4red</cbc:Field4>|
| --- | --- | --- |
|t parsedField1||blue|
|t parsedField2||yellow|
|t parsedField3||green|
|t parsedField4||red|
|t path||C:/Data/xyz/sample.xml|
|t tags||multiline|

just one thing though is that , it created another log where it only contains the path of where my file was and the |t tags| |_xmlparsefailure|

t path C:/Data/xyz/sample.xml
t tags _xmlparsefailure

when is started logstash + conf it looked like this :slight_smile: 2fc5b8d5] XML Parse Error {:exception=>"/DataSet/cbc:Field1/text(): org.apache.xpath.domapi.XPathStylesheetDOM3Exception: Prefix must resolve to a namespace: cbc", :source=>"message", :value=>"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r"}
{
"@version" => "1",
"@timestamp" => 2021-05-24T19:13:40.698Z,
"path" => "C:/Data/xyz/sample.xml",
"tags" => [
[0] "_xmlparsefailure"
],
"message" => "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\r",
"host" => "DESKTOP-DE7H6ES"
}
{
"@version" => "1",
"parsedField4" => [
[0] "red"
],
"@timestamp" => 2021-05-24T19:13:42.256Z,
"path" => "C:/Data/xyz/sample.xml",
"parsedField1" => [
[0] "blue"
],
"tags" => [
[0] "multiline"
],
"message" => "<DataSet xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">\r\n cbc:Field1blue</cbc:Field1>\r\n cbc:Field2yellow</cbc:Field2>\r\n cbc:Field3green</cbc:Field3>\r\n cbc:Field4red</cbc:Field4>\r",
"host" => "DESKTOP-DE7H6ES",
"parsedField2" => [
[0] "yellow"
],
"parsedField3" => [
[0] "green"
]

It has been a few years since I had to deal with XML namespaces, but if I remember correctly you need to tell it about your names spaces. It does not parse out

<DataSet 
xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2">

you need to add

namespaces => {
    "cbc" => "urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2"
    "cac" => "urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2"
}

to the xml filter.

that worked. thanks a ton Badger! appreciate all the help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.