Get xml node name

Hello,
I am little lost with one problem when parsing xml files to ES with Logstash. Due to unfortunate xml structure which I can't change, I need to get string name of specific nodes. Example of xml structure:

<Parent>
<FirstName>XXXX</FirstName>
<SecondName>XXXX</SecondName>
<WhateverName>XXXX</ThirdName>
</Parent>

Names of the "Name fields" may vary and I need to get list of their names to one array field which should be pushed to Elastic like this:

ElasticField = {FirstName, SecondName, WhateverName, etc.}

For text() values I use xml filter and xpath which works great. Is there any easy way, how to do it?

Thank you

The name() function returns the name of the first element it is given. If there is a known limit to the number of name elements you could do

    xml {
        source => "message"
        store_xml => false
        xpath => {
            "name(/Parent/*[1])" => "[@metadata][names][0]"
            "name(/Parent/*[2])" => "[@metadata][names][1]"
            "name(/Parent/*[3])" => "[@metadata][names][2]"
            "name(/Parent/*[4])" => "[@metadata][names][3]"
            "name(/Parent/*[5])" => "[@metadata][names][4]"
        }
    }

This will create a hash of arrays of arrays

    "names" => {
        "1" => [
            [0] "SecondName"
        ],
        "4" => [
            [0] ""
        ],
        ...

Why they are out of order (and Ruby hashes are ordered) I have no idea. To convert that to an array you can use a ruby filter. If you do not care about the order then

            names = event.get("[@metadata][names]")
            nameList = []
            names.each { |k, v|
                if v[0] != ""
                    nameList << v[0]
                end
            }
            event.set("nameElements", nameList)

will get you

"nameElements" => [
    [0] "SecondName",
    [1] "FirstName",
    [2] "WhateverName"
],

If you do care about order then

            nameList = []
            names = event.get("[@metadata][names]")
            (0..4).each { |x|
                if names[x.to_s][0] != ""
                    nameList << names[x.to_s][0]
                end
            }
            event.set("nameElements", nameList)

will get you

"nameElements" => [
    [0] "FirstName",
    [1] "SecondName",
    [2] "WhateverName"
],

I cannot think of a way to deal with an arbitrary number of elements other than setting store_xml to true and using a ruby filter to extract the element names.

Thanks, that helped a lot.

However, I have realized that all the nodes I am trying to get their names have the same children structure. Is there any way, how to approach the problem from parent-children perspective?

I am trying to use this:
'xpath => {"name(//time//parent::node())" => "NodeNamesArray"}'

... but it only gets me the first parent node name the parser finds. Is there any other way?

Thank you very much.

The name function is documented as returning the name of the first element it is given.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.