Logstash xml parsing problems


(Povilas) #1

Hello,

I have the problem with xml parsing, my parsed result looks like :

{"newMSISDN":"%{parsedMSISDN}","offset":52307874,"host":"miram.int.bite.lt","prospector":{"type":"log"},"@version":"1","@timestamp":"2018-04-16T06:03:11.780Z","beat":{"hostname":"miram.int.bite.lt","name":"miram.int.bite.lt","version":"6.2.3"},"source":"/ocs/mobicents/server/bite/log/ocs.log","message":"2018-04-16 08:01:18,089 INFO [net.bitegroup.ocs.lt.sbb.OcsSbb] Request CCR: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><ccr><msisdn>37068783222</msisdn><ocsIp>10.241.53.45</ocsIp><apn>internplt</apn><sessionId>c0-10-225-64-26-epg02;1516844161;24238957</sessionId><ccrMscc><inputOctetsUsed>0</inputOctetsUsed><outputOctetsUsed>0</outputOctetsUsed><totalOctetsUsed>0</totalOctetsUsed><timeUsed>5013</timeUsed><inputOctetsUsedAfterTct>0</inputOctetsUsedAfterTct><outputOctetsUsedAfterTct>0</outputOctetsUsedAfterTct><totalOctetsUsedAfterTct>0</totalOctetsUsedAfterTct><timeUsedAfterTct>0</timeUsedAfterTct><ratingGroup>2085</ratingGroup><inputOctetsRequested>0</inputOctetsRequested><outputOctetsRequested>0</outputOctetsRequested><totalOctetsRequested>0</totalOctetsRequested><timeRequested>0</timeRequested><reportingReason>0</reportingReason><tariffChangeUsed>false</tariffChangeUsed></ccrMscc><sgsnIp>213.226.158.160</sgsnIp><ggsnIp>213.226.158.154</ggsnIp><imsi>246021005289080</imsi><imei>5359126095156720</imei><userLocationInfo>0142f62029053054</userLocationInfo><sgsnMccMnc>24602</sgsnMccMnc><chargingId>4116118880</chargingId><ip>10.20.170.183</ip><qos></qos><chargingCharacteristic>0400</chargingCharacteristic><chargingRuleName></chargingRuleName><requestType>UPDATE_REQUEST</requestType><requestNumber>453</requestNumber><creditControlFailureHandlingType>0</creditControlFailureHandlingType><ccSessionFailover>0</ccSessionFailover></ccr>","tags":["beats_input_codec_plain_applied"]}

My code:

input {
  beats {
    port => 5043
  }
}

filter {
 
    xml {
      source => "message"
      remove_namespaces => "true"
      xpath => ["//ccrB/msisdn/text()", "parsedMSISDN",
                "//ccrB/ocsIp/text()", "ocsIP"
               ]
      store_xml => "false"
    }
    mutate {
      add_field =>{"newMSISDN" => "%{parsedMSISDN}"}
    }
}

output {

    file {
      path => "/home/work/logstash-out/ocs_test.out"
    }

    #stdout { codec => rubydebug }

}

i want to add field newMSISDN but I don't get xml value, as i understand problem is in this place

<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>, maybe xpath don't understand these symbols **\"**

Maybe someone can help me with this one , thanks :slight_smile:


(Christian Dahlqvist) #2

Your message field starts like this:

2018-04-16 08:01:18,089 INFO [net.bitegroup.ocs.lt.sbb.OcsSbb] Request CCR: <?xml version=\"1.0\"...

As this is not all valid XML, you can not directly apply the XML filter to it. You will need to use a grok or dissect filter to parse the components of the log so you get the XML content at the end in a separate field. Then you can use the XML filter on this field.


(Povilas) #3

Thanks for the answer :slight_smile:

Maybe you can say more which grok filter I should use, I try to use {GREEDYDATA:msgBody}
but this only take all message, how can I take only xml log block ? :slight_smile: Main problem is that logstash add these symbols, maybe I can somehow take them off ? and then as I understand also should work .


(Christian Dahlqvist) #4

Have a look at the dissect filter as it might be easier to get started with and should be sufficient here. This blog post contains a good introduction.


(Povilas) #5

Sorry, I don't have much time, so only now I checked your answer and I think this is not good. Is there really no other way to say for logstash to not add these symbols in this place ( \ ) :

<?xml version=**\**"1.0**\**" encoding=**\**"UTF-8**\**" standalone=**\**"yes**\**"?>

Because in my real log there not exist these symbols , for example in my real log this place looks like this and the xpath work then:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

So is there really no way to config logstash to not add these symbols ?


(Christian Dahlqvist) #6

Did you try it out? If so, what did not work?


(Povilas) #7

Hi!

I have try it and this I think for me is not correct because my log all the time is differend sometimes I don't even have xml in my log. So I now try to take value with grok regex
So now I only have this problem, maybe you would help me, I take value like this:

image

but my result looks like this : image
How i can get result like this : 37068783222

:slight_smile:


#8

If you only want the middle part extracted, you have to put the named group only around the middle part. (Posting a screenshot instead of text doesn't make helping you easier :wink: )

<msisdn>(?<msisdn>.*)<\/msisdn>

(I don't know what the additional :? and ? were supposed to do.)


(Povilas) #9

Hi thanks all for your help I taked all xml with grok regex like this:
match => { "message" => "(?<(.*)</cc.>)"}

And then use xml plugin and take all values I need :slight_smile:


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.