Logstash xml parsing - Fields extraction

HI,
I am new to Elastic Search
I am streaming the xml files from Apache Kafka to elastic search, where I need to assign the fields for xml attributes. I am struggling to get pass through it. I need to get the below fields assigned, so that I can view it from Kibana
eventCreationDtm
eventCreationDtmStr
tagIssDtm
bagOrigArpt
destArptCd

<?xml version="1.0" encoding="UTF-8"?>

<ns0:Envelope xmlns:ns0="http://www.test.com/Schema/BAG.xsd">
<ns1:eventHeader xmlns:ns1="http://www.test.com/eai/event/header" eventName="BGIC" eventCreationSys="Baggage" eventCreationDtm="2018-02-24T21:33:03.698Z" eventActionCd="BGIC" eventID="4016426319" version="2.0.0">
<ns1:srcSys eventID="4016426319" eventName="BGIC" eventCreationSys="Baggage" eventCreationDtmStr="2018-02-24T21:33:03.698Z" processLoc=“IXM”>
ns1:usr/
</ns1:srcSys>
</ns1:eventHeader>
ns0:Body


FRT
BHK
B
99
false
false
false
false
H3HHSH


LASTNAME
FIRSTNAME
9JSIJF8



true

I have added the sample xml. Pls assist on this,

Make sure to format XML as preformatted text with the </> toolbar button. As you can see your XML has been mangled.

Have you looked at the xml filter? Its xpath option should make it very easy to extract the contents of the elements/attributes that you list.

HI Magnus, I have the below sample xml`<?xml version="1.0" encoding="UTF-8"?>

<ns0:Envelope xmlns:ns0="http://www.TEST.com/Schema/BAG.xsd">
<ns1:eventHeader xmlns:ns1="http://www.TEST.com/eai/event/header" eventName="JKJK" eventCreationSys="Baggage" eventCreationDtm="2018-02-24T21:33:03.698Z" eventActionCd="JKJK" eventID="11111" version="2.0.0">
<ns1:srcSys eventID="11111" eventName="TEST" eventCreationSys="Baggage" eventCreationDtmStr="2018-02-24T21:33:03.698Z" processLoc="TEST">
ns1:usr/
</ns1:srcSys>
</ns1:eventHeader>
ns0:Body


AAA
BBB
B
3
false
false
false
false
3FERR33


</ns0:Body>
</ns0:Envelope>`

The logstash.conf looks like
`input {
kafka {
bootstrap_servers => "localhost:9092"
topics => ["Hello1"]
}
}

filter {
xml {
store_xml => false
source => "message"
xpath => ["/ns0:Envelope/ns0:Body/bagDetails/@tagIssDtm/string()", "date" ]
}

date {
match => [ "date" , "dd-MM-yyyy HH:mm:ss" ]
timezone => "Europe/Amsterdam"
}
}

output {
stdout {codec=> "json_lines"}
elasticsearch {
hosts => ["localhost:9200"]
index => "elasticse"
}
}`

I am testing for one attribute and getting exception. I need to get the below attributes,
tagIssDtm
primaryTypePriority
eventCreationDtm

Your XML is still mangled because you didn't format your post as requested.

Again pasted the xml below,

<?xml version="1.0" encoding="UTF-8"?>
<ns0:Envelope xmlns:ns0="http://www.TEST.com/Schema/BAG.xsd">
    <ns1:eventHeader xmlns:ns1="http://www.TEST.com/eai/event/header" eventName="JKJK" eventCreationSys="Baggage" eventCreationDtm="2018-02-24T21:33:03.698Z" eventActionCd="JKJK" eventID="11111" version="2.0.0">
        <ns1:srcSys eventID="11111" eventName="TEST" eventCreationSys="Baggage" eventCreationDtmStr="2018-02-24T21:33:03.698Z" processLoc="TEST">
            <ns1:usr/>
        </ns1:srcSys>
    </ns1:eventHeader>
    <ns0:Body>
        <bagDetails tagNbr="11111" tagUniqKey="22222" tagIssDtm="2018-02-24T19:03:49.368Z" bagTagActvInd="true">
            <bagInfo>
                <bagOrigArpt>AAA</bagOrigArpt>
                <bagTermArpt>BBB</bagTermArpt>
                <tagPrimaryType>B</tagPrimaryType>
                <primaryTypePriority>3</primaryTypePriority>
                <isPriority>false</isPriority>
                <isHeavy>false</isHeavy>
                <isRush>false</isRush>
                <isSelectee>false</isSelectee>
                <printerId>3FERR33</printerId>
            </bagInfo>
        </bagDetails>
    </ns0:Body>
</ns0:Envelope>

Okay. What you have should work, but I recall there being some problems with XML namespaces. Have you tried enableing the remove_namespaces option and changing the XPath expression to /Envelope/Body/bagDetails/@tagIssDtm/string()?

HI Magnus,

I tried with the below conf and still the syntax error occurs.

   filter {
  xml {
   store_xml => false
   source => "message"
   remove_namespaces =>  "true"
   xpath => ["/Envelope/Body/bagDetails/@tagIssDtm/string()", "date" ]
      }
 
date {
    match => [ "date" , "dd-MM-yyyy HH:mm:ss" ]
    timezone => "Europe/Amsterdam"
     }
     } 

Errors,
[2018-03-13T10:25:24,879][ERROR][logstash.pipeline ] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash. {:pipeline_id=>"main", "exception"=>"/Envelope/Body/bagDetails/@tagIssDtm/string()", "backtrace"=>["nokogiri/XmlXpathContext.java:130:in evaluate'", "/Users/sathish/apps/logstash/vendor/bundle/jruby/2.3.0/gems/nokogiri-1.8.2-java/lib/nokogiri/xml/searchable.rb:198:inxpath_impl'"

Then I don't know what's going on.

remove_namespaces is not removing the ns tag

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.