A little help needed on parsing xml

pk.241011 · April 27, 2017, 2:09am

Hi,
I might need a few ideas here to start off with Logstash.

I am pulling data from Microsoft SQL server. And one of the fields returned is an XML.

This is a representation of the xml.

<SampleResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <DemoResults name="Hello There" executionDuration="23" status="Success">
	<SingleResult name="1 : Date" status="Success" context="Demo Steps" time="2017-02-02T15:58:26.9776219+11:00">
	  <RealValue>
		<Text value="02.02.2017" />
	  </RealValue>
	</SingleResult>
	<SingleResult name="2 : ErrorNum" status="Success" context="Demo Steps" time="2017-02-02T15:58:29.3776219+11:00">
	  <RealValue>
		<Text value="0" />
	  </RealValue>
	</SingleResult>
	<SingleResult name="2 : ErrorText" status="Success" context="Demo Steps" time="2017-02-02T15:58:29.3776219+11:00">
	  <RealValue>
		<Text value="OK" />
	  </RealValue>
	</SingleResult>
  </DemoResults>
</SampleResults>

This is the logstash conf file I am using.

input {
  jdbc {
   jdbc_driver_library => "/home/brucewayne/software/sqljdbc_4.2/enu/jre8/sqljdbc42.jar"
   jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
   jdbc_connection_string => "jdbc:sqlserver://wayneenterprises:9999;databaseName=jokerfiles"
   jdbc_user => "bruce"
   jdbc_password => "rachel"
   statement => "SELECT t.ID AS id,t.ResultXML AS resultXML FROM blah blah"
   jdbc_paging_enabled => "true"
   jdbc_page_size => "50000"
   }
 }
# IF you want to add Filter you can add one
filter {
 xml {
  source => "%{resultXML}"
  target => "parsed"
 }

 #split {
 # field => "parsed[SingleResult]"
 #}
}

output {
 elasticsearch {
 hosts => "myelastichost:9292"
 index => "testdatabase"
 document_id => "%{id}"
 document_type => "demo"
 manage_template => true
 }
 stdout { codec => rubydebug }
}

While data does get populated but I think I have made some goof up on the xml part.
If I look at data in json format in kibana then I see that the whole xml appear as a string under the _source against the key resultxml. My understanding is that the xml filter takes a field that contains XML and expands it into an actual datastructure. What I have here is the whole xml in plain string format.

Somehow I am missing the magic here.

magnusbaeck · April 27, 2017, 5:27am

source => "%{resultXML}"

The source option should contain the name of the field to parse, not the actual field contents, so you should do this:

source => "resultXML"

pk.241011 · April 27, 2017, 5:48am

Thanks for the reply @magnusbaeck.
I tried with [quote="magnusbaeck, post:2, topic:83784"]
source => "resultXML"
[/quote]

but the result are same.

magnusbaeck · April 27, 2017, 5:50am

Where's the parsed field where the parsed XML should've been stored? Where's the id field that your jdbc input is also creating alongside resultXML? Are you really using the configuration you've posted?

pk.241011 · April 27, 2017, 5:56am

id is coming from here.

I think you are talking about this one. I am not sure how to use this.

This is what I see in logstash output.

{
	"@timestamp" => 2017-04-27T05:44:02.709Z,
	 "resultxml" => "<.....the whole xml string......>",
	  "@version" => "1",
			"id" => -2112646798
}

Not sure if that helps in the diagnosis. Sorry for sketchy details. Just started on this.

magnusbaeck · April 27, 2017, 6:26am

id is coming from here.

Yes, but there was no id field in the Kibana screenshot you posted.

This is what I see in logstash output.

Yeah, there's no sign of the xml filter running at all. If the parsing fails it should add a tag. Reproducing your configuration but with another input works fine:

$ cat test.config 
input { stdin { codec => json_lines } }
output { stdout { codec => rubydebug } }
filter {
  xml {
    source => "resultXML"
    target => "parsed"
  }
}
$ cat data 
{"resultXML": "<SampleResults xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"> <DemoResults name=\"Hello There\" executionDuration=\"23\" status=\"Success\"> <SingleResult name=\"1 : Date\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:26.9776219+11:00\"> <RealValue> <Text value=\"02.02.2017\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorNum\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"0\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorText\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"OK\" /> </RealValue> </SingleResult> </DemoResults> </SampleResults>"}
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 8
Pipeline main started
{
     "resultXML" => "<SampleResults xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"> <DemoResults name=\"Hello There\" executionDuration=\"23\" status=\"Success\"> <SingleResult name=\"1 : Date\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:26.9776219+11:00\"> <RealValue> <Text value=\"02.02.2017\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorNum\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"0\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorText\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"OK\" /> </RealValue> </SingleResult> </DemoResults> </SampleResults>",
      "@version" => "1",
    "@timestamp" => "2017-04-27T06:25:15.321Z",
          "host" => "lnxolofon",
        "parsed" => {
          "xmlns:xsi" => "http://www.w3.org/2001/XMLSchema-instance",
          "xmlns:xsd" => "http://www.w3.org/2001/XMLSchema",
        "DemoResults" => [
            [0] {
                             "name" => "Hello There",
                "executionDuration" => "23",
                           "status" => "Success",
                     "SingleResult" => [
                    [0] {
                             "name" => "1 : Date",
                           "status" => "Success",
                          "context" => "Demo Steps",
                             "time" => "2017-02-02T15:58:26.9776219+11:00",
                        "RealValue" => [
                            [0] {
                                "Text" => [
                                    [0] {
                                        "value" => "02.02.2017"
                                    }
                                ]
                            }
                        ]
                    },
                    [1] {
                             "name" => "2 : ErrorNum",
                           "status" => "Success",
                          "context" => "Demo Steps",
                             "time" => "2017-02-02T15:58:29.3776219+11:00",
                        "RealValue" => [
                            [0] {
                                "Text" => [
                                    [0] {
                                        "value" => "0"
                                    }
                                ]
                            }
                        ]
                    },
                    [2] {
                             "name" => "2 : ErrorText",
                           "status" => "Success",
                          "context" => "Demo Steps",
                             "time" => "2017-02-02T15:58:29.3776219+11:00",
                        "RealValue" => [
                            [0] {
                                "Text" => [
                                    [0] {
                                        "value" => "OK"
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

pk.241011 · April 27, 2017, 7:25am

Does it matter how the xml comes? Straight from the field it comes as a single string without any line breaks or any formatting whatsoever when I use jdbc input.

magnusbaeck · April 27, 2017, 7:33am

Linebreaks or not doesn't matter.

system · May 25, 2017, 7:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing multiline, netsted XML using Logstash Logstash	8	310	April 10, 2019
Need help to parse xml file with logstash Logstash	12	1007	September 23, 2019
Trying to parse XML-logs Logstash	11	2928	June 25, 2018
XML to Elasticsearch via Logstash Logstash	47	3461	April 3, 2018
Logstash xml parsing Logstash	2	623	April 5, 2017

A little help needed on parsing xml

Related topics