A little help needed on parsing xml

Hi,
I might need a few ideas here to start off with Logstash.

I am pulling data from Microsoft SQL server. And one of the fields returned is an XML.

This is a representation of the xml.

<SampleResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <DemoResults name="Hello There" executionDuration="23" status="Success">
	<SingleResult name="1 : Date" status="Success" context="Demo Steps" time="2017-02-02T15:58:26.9776219+11:00">
	  <RealValue>
		<Text value="02.02.2017" />
	  </RealValue>
	</SingleResult>
	<SingleResult name="2 : ErrorNum" status="Success" context="Demo Steps" time="2017-02-02T15:58:29.3776219+11:00">
	  <RealValue>
		<Text value="0" />
	  </RealValue>
	</SingleResult>
	<SingleResult name="2 : ErrorText" status="Success" context="Demo Steps" time="2017-02-02T15:58:29.3776219+11:00">
	  <RealValue>
		<Text value="OK" />
	  </RealValue>
	</SingleResult>
  </DemoResults>
</SampleResults>

This is the logstash conf file I am using.

input {
  jdbc {
   jdbc_driver_library => "/home/brucewayne/software/sqljdbc_4.2/enu/jre8/sqljdbc42.jar"
   jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
   jdbc_connection_string => "jdbc:sqlserver://wayneenterprises:9999;databaseName=jokerfiles"
   jdbc_user => "bruce"
   jdbc_password => "rachel"
   statement => "SELECT t.ID AS id,t.ResultXML AS resultXML FROM blah blah"
   jdbc_paging_enabled => "true"
   jdbc_page_size => "50000"
   }
 }
# IF you want to add Filter you can add one
filter {
 xml {
  source => "%{resultXML}"
  target => "parsed"
 }

 #split {
 # field => "parsed[SingleResult]"
 #}
}

output {
 elasticsearch {
 hosts => "myelastichost:9292"
 index => "testdatabase"
 document_id => "%{id}"
 document_type => "demo"
 manage_template => true
 }
 stdout { codec => rubydebug }
}

While data does get populated but I think I have made some goof up on the xml part.
If I look at data in json format in kibana then I see that the whole xml appear as a string under the _source against the key resultxml. My understanding is that the xml filter takes a field that contains XML and expands it into an actual datastructure. What I have here is the whole xml in plain string format.

Somehow I am missing the magic here.

source => "%{resultXML}"

The source option should contain the name of the field to parse, not the actual field contents, so you should do this:

source => "resultXML"

Thanks for the reply @magnusbaeck.
I tried with [quote="magnusbaeck, post:2, topic:83784"]
source => "resultXML"
[/quote]

but the result are same.

Where's the parsed field where the parsed XML should've been stored? Where's the id field that your jdbc input is also creating alongside resultXML? Are you really using the configuration you've posted?

id is coming from here.

I think you are talking about this one. I am not sure how to use this.

This is what I see in logstash output.

{
	"@timestamp" => 2017-04-27T05:44:02.709Z,
	 "resultxml" => "<.....the whole xml string......>",
	  "@version" => "1",
			"id" => -2112646798
}

Not sure if that helps in the diagnosis. Sorry for sketchy details. Just started on this.

id is coming from here.

Yes, but there was no id field in the Kibana screenshot you posted.

This is what I see in logstash output.

Yeah, there's no sign of the xml filter running at all. If the parsing fails it should add a tag. Reproducing your configuration but with another input works fine:

$ cat test.config 
input { stdin { codec => json_lines } }
output { stdout { codec => rubydebug } }
filter {
  xml {
    source => "resultXML"
    target => "parsed"
  }
}
$ cat data 
{"resultXML": "<SampleResults xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"> <DemoResults name=\"Hello There\" executionDuration=\"23\" status=\"Success\"> <SingleResult name=\"1 : Date\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:26.9776219+11:00\"> <RealValue> <Text value=\"02.02.2017\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorNum\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"0\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorText\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"OK\" /> </RealValue> </SingleResult> </DemoResults> </SampleResults>"}
$ /opt/logstash/bin/logstash -f test.config < data
Settings: Default pipeline workers: 8
Pipeline main started
{
     "resultXML" => "<SampleResults xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"> <DemoResults name=\"Hello There\" executionDuration=\"23\" status=\"Success\"> <SingleResult name=\"1 : Date\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:26.9776219+11:00\"> <RealValue> <Text value=\"02.02.2017\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorNum\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"0\" /> </RealValue> </SingleResult> <SingleResult name=\"2 : ErrorText\" status=\"Success\" context=\"Demo Steps\" time=\"2017-02-02T15:58:29.3776219+11:00\"> <RealValue> <Text value=\"OK\" /> </RealValue> </SingleResult> </DemoResults> </SampleResults>",
      "@version" => "1",
    "@timestamp" => "2017-04-27T06:25:15.321Z",
          "host" => "lnxolofon",
        "parsed" => {
          "xmlns:xsi" => "http://www.w3.org/2001/XMLSchema-instance",
          "xmlns:xsd" => "http://www.w3.org/2001/XMLSchema",
        "DemoResults" => [
            [0] {
                             "name" => "Hello There",
                "executionDuration" => "23",
                           "status" => "Success",
                     "SingleResult" => [
                    [0] {
                             "name" => "1 : Date",
                           "status" => "Success",
                          "context" => "Demo Steps",
                             "time" => "2017-02-02T15:58:26.9776219+11:00",
                        "RealValue" => [
                            [0] {
                                "Text" => [
                                    [0] {
                                        "value" => "02.02.2017"
                                    }
                                ]
                            }
                        ]
                    },
                    [1] {
                             "name" => "2 : ErrorNum",
                           "status" => "Success",
                          "context" => "Demo Steps",
                             "time" => "2017-02-02T15:58:29.3776219+11:00",
                        "RealValue" => [
                            [0] {
                                "Text" => [
                                    [0] {
                                        "value" => "0"
                                    }
                                ]
                            }
                        ]
                    },
                    [2] {
                             "name" => "2 : ErrorText",
                           "status" => "Success",
                          "context" => "Demo Steps",
                             "time" => "2017-02-02T15:58:29.3776219+11:00",
                        "RealValue" => [
                            [0] {
                                "Text" => [
                                    [0] {
                                        "value" => "OK"
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    }
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

Does it matter how the xml comes? Straight from the field it comes as a single string without any line breaks or any formatting whatsoever when I use jdbc input.

Linebreaks or not doesn't matter.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.