Possible to Parse Nested XML Data structures and combine parent with child


(Adrian Black) #1

I have a nested xml document with multiple parents and children and i want to break this down to effectively be a linear list like...

    <root>
    <child parent_id="1">Child 1</child>
   ....
    </root>     

so i need to combine the parent data with the child data to form a complete record.
I have successfully managed to process all children of the first parent and access the parent data but cant figure out a way of breaking up the document twice or referring to the parent element when processing a child.

  <root>
    <parent id="1">
    <child>Child 1</child>
    <child>Child 2</child>
   </parent>
    <parent id="2">
    <child>Child 1</child>
    <child>Child 2</child>
    </parent>
</root>

in the xml filter I can only seem to split on

[parent][0][child]

in general i am trying to see if logstash is flexible enough to deal with ad heterogenous xml docs (differetn structure from different clients) - im thinking currently that it isnt and that im better processing them first and producing a standard output for logstash to ingest.


#2

Is that what you are starting with or what you are aiming for? If the latter, what are you starting with?


(Adrian Black) #3

I am starting with that. I can see how to do it if i only ever have 1 parent but when there are muliple parent / children blocks thats what i cant see how to break it down.


#4

If you want one document per child then

    xml { source => "message" target => "theXML" store_xml => true }
    split { field => "[theXML][parent]" }
    split { field => "[theXML][parent][child]" }

will get you events that look like

    "theXML" => {
    "parent" => {
           "id" => "2",
        "child" => "Child 2"
    }

You can then re-arrange the fields as you want them.

If you really want to transform one XML document that has parents with nested children into an XML document that has a list of children with a parent attribute I don't think logstash is the tool you want. Maybe XLST?


(Adrian Black) #5

Bugger a duck. Im sure i tried something like that. That does work - thanks. I will give XSLT and i believe you can write python snippets a go perhaps but they are outside of my knowledge currently.


#6

If you actually want XML out that can be done.

    xml { source => "message" target => "[theXML]" store_xml => true force_array => false }
    ruby {
        code => '
            p = event.get("[theXML][parent]")

            progeny = "<root>"
            # Iterate over the rents
            p.each { |x|
                id = x["id"]

                # Iterate over the sprogs
                x["child"].each { |y|
                    progeny = progeny + "<child parent_id=\"" + id + "\">" + y + "</child>"
                }
            }
            progeny = progeny + "</root>"
            event.set("progeny", progeny)
        '
    }

along with

output { stdout { codec => line { format => "%{progeny}" } } }

(Adrian Black) #7

Useful but no. it was just the flattening capability i was looking for. That was just an example of how the xml feed could have come in and not be nested i.e. more simple to deal with from what id seen from other examples. We will probably want to index into Elastic Search.