XML Filter Help Required


(Shaun Wells) #1

Hello All, I'm trying to get my head around the XML Filter, I have the following configuration for my filter.

 xml {
        source => "message"
        target => "message_parsed"
        add_tag => ["xml_parsed"]

When I parse the following extract through logstash.

20150525T20:21:13 <stats><stats xmlns='jcs:stats:jsm'><current-online-user-count>1730</current-online-user-count><login-rate>0</login-rate><successful_logins>93645</successful_logins><failed_logins>84583</failed_logins><uptime>1900999</uptime></stats>
<statsxmlns='jcs:stats:delivery'><total-message-packets>5428196</total-message-packets><total-presence-packets>288328380</total-presence-packets><total-iq-packets>4977074</total-iq-packets><messages-in-last-time-slice>0</messages-in-last-time-slice><average-message-size>0</average-message-size></stats></stats>

I seem to end up with what looks like an ARRAY.. What I was hoping for was a JSON type document, where each specific element within the XML, is defined with it's own field. What I actually see on my console is the following - objects in an array, are not well supported in Kibana..

 "message_parsed" => {
        "stats" => [
            [0] {
                                    "xmlns" => "jcs:stats:jsm",
                "current-online-user-count" => [
                    [0] "10"
                ],
                               "login-rate" => [
                    [0] "0"
                ],
                        "successful_logins" => [
                    [0] "100"
                ],
                            "failed_logins" => [
                    [0] "5"
                ],
                                   "uptime" => [
                    [0] "1901060"
                ]
            },
            [1] {
                                      "xmlns" => "jcs:stats:delivery",
                      "total-message-packets" => [
                    [0] "1000"
                ],
                     "total-presence-packets" => [
                    [0] "100000"
                ],
                           "total-iq-packets" => [
                    [0] "100000"
                ],
                "messages-in-last-time-slice" => [
                    [0] "0"
                ],
                       "average-message-size" => [
                    [0] "0"
                ]
            }

I was expecting to see in my JSON document, something like :smile:

"user-count" : "value"
"login-rate" : "value"

Any ideas on how I can fix this, or am I misunderstanding what the XML filter should be doing here. Examples would be helpful..
Thanks.
Boardman


(Rafał Trójniak) #2

Hello,

The topic is interesting, I had bumped into it some time ago.
The reason why the elements are generated as arrays, is because nodes () can happen multiple times.
The problem above does not happen with attributes, as they have to be unique on the each node (according to XML standard). So if you xml would look like the problem would not happen.

I was trying to use 'xpath' option https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html#plugins-filters-xml-xpath . According to documentation it should be able to solve your problems.
Unfortunately I was trying to achieve that, but I failed :confused: Don't know why neither those queries didn't work :
xpath => [
"failed_logins/text()", "x_failed_logins",
"failed_logins", "x_failed_logins2",
"/stats/stats", "x_stats",
"/stats/stats[0]", "x_stats0",
"/stats/stats[1]", "x_stats1",
"/stats/stats[0]/failed_logins", "x_failed_logins3"
]

I don't know what was wrong with above queries - they didn't throw any errors/warnings.

Regards,


(Rafał Trójniak) #3

Ah, found the problem.
There supposed to be a bug that is hit when xml has 'xmlns' attributes. https://github.com/logstash-plugins/logstash-filter-xml/issues/10
After removing them I was able to extract the values to the root of the event. They are still an arrays, but this should be possible to query.

Please see example rules here :


Documentation here :


(Shaun Wells) #4

When you say that you've removed the 'xmlns' element, was this a manual thing you've done with my example.. As my daily log contains around 1400 entries a day, so how do I go about removing this.. Or is it the fact that I just specify an xpath for each element, as shown below.

"/stats/stats/failed_logins/text()", "x_failed_logins"
"/stats/stats/login_rates/text()", "x_login_rates"

etc...
etc..


(Rafał Trójniak) #5

Hey,

Yes, I removed them manually. I understand that this is impossible to be done on regular basis.

The problem is, that xpath handling looks to be buggy : https://github.com/logstash-plugins/logstash-filter-xml/issues/10 or I'm using it incorrectly.
When the xmlns attributes are present, I was not able to get any results from xpath queries.
I have no idea what is going one. Maybe it is failing, maybe the xpath processing is somehow affected by that xmlns. I really don't know.

Regards,


(Shaun Wells) #6

I was thinking of doing the following and was not sure if this is possible, but could I parse the file through GROK and do a pattern match, stripe the offending field out of the log file and then pass the rest of the results to the XML Filter ?

Otherwise, I think my only other option is to do some sort of pattern match on all the separate elements, but I only want the value's not the tags.. So far I've managed to PATTERN match one element, but not sure how I just grab the value.

<current-user>1</current-user>
(?<currentuser><[\w-]+>[0-9]+<.[\w-]+>)

Give's me the following results:

{
  "currentuser": [
    [
      "<current-user>1</current-user>"
    ]
  ]
}

(Rafał Trójniak) #7

Hey,

Just got back to that issue with few days of experience more :stuck_out_tongue:

The XPath is actually working fine - my bad. The namespaces just altered the names of the XML nodes which made my XPath rules not working properly. There are two ways to resolve that known to me :

After further thinking : Why do you want to make this not an array?
You can always refer to the variable as [x_failed_logins][0] (0-element of an array )
If you insist on that, you can use mutate plugin to move variable from one place to the other.

About using grok to do that - well, it would work, but IMHO that will cause more problems than unpacking the event using XML and than using Xpath or moving field directly where you want it.

Regards,


(system) #8