Logstash to parse xml file


(Mohamed) #1

I am new on ELK, and I need a help to use Kibana. I have 5 fields; one for the name of data, and for each name there is different type, and for each type there is different value for each object. My xml file is like that (just an example)

       <Name nameID="xxxxx"> <Type p="1">xxxxx</Type> <Type
        p="2">yyyyy</Type> <Obj id="1"> <Value r="1">2.2</Value> <Value
        r="2">3.2</Value> <Obj id="2"> <Value r="1">0.2</Value> <Value
        r="2">76.2</Value></Name>

So what I need to do is to get the name and value of each Type for different obj.

Using Logstash I get


What I would like to get is different ligne like that:

My logstash.conf

input {
file {
path => "/home/test/data.xml"
start_position => beginning
sincedb_path => "/dev/null"
codec => multiline
{
pattern => "Name>"
negate => true
what => "previous"
}
}
}
filter
{
xml {
source => "message"
target => "parsed"
add_tag => "xml"
xpath => [
"//Name/@nameID","Name",
"//Type/@p","TypeID",
"//Type/text()","Type",
"//Obj/@id","Obj",
"//r/text()","value"]


#2

That's not valid XML (the Obj elements are not terminated). Show us an actual example of input XML and the output either from Kibana's JSON tab or from output { stdout { codec => rubydebug } }

Note also that there is nothing in the XML to associate the values with the types. If you instead had a structure like

<Type>
<Obj> <Value></Value> <Value> </Value> </Obj>
<Obj> <Value></Value> <Value> </Value> </Obj>
</Type>
<Type>
<Obj> <Value></Value> <Value> </Value> </Obj>
<Obj> <Value></Value> <Value> </Value> </Obj>
</Type>

Then you could use something like

filter { xml { source => "message" store_xml => true target => "theXML" force_array => false } }
split { field => "[theXML][Type]" }
split { field => "[theXML][Type][Obj]" }
split { field => "[theXML][Type][Obj][Value]" }

As it is you will probably need a ruby filter to iterate over the arrays that xpath returns and build clones of the event.


(Mohamed) #3

My XML looks like that

<Name nameID="xxxx">
  <Type p="1">xxxxxx</Type>
  <Type p="2">xxxxxx</Type>
  <Value obj="1"> 
    <r p="1">5.94</r>
    <r p="2">62.19</r>
  </Value>
  <Value obj="2"> 
    <r p="1">5.94</r>
    <r p="2">62.19</r>
  </Value>
</Name>
<Name nameID="yyyy">
  <Type p="1">yyyyy</Type>
  <Type p="2">yyyyyy</Type>
  <Type p="3">yyyy</Type>
  <Value obj="1"> 
    <r p="1">54.94</r>
    <r p="2">6.19</r>
    <r p="3">0</r>
  </Value>
</Name>

(Mohamed) #4

What I woul like to get is something like that:
"NameID = name1
Type = Type1
obj = obj1
Value = xx
"
"NameID = name1
Type = Type2
obj = obj1
Value = xx
"
"NameID = name1
Type = Type3
obj = obj1
Value = xx
"
...etc
and then
"NameID = name1
Type = Type1
obj = obj2
Value = xx
"
"NameID = name1
Type = Type2
obj = obj2
Value = xx
"
....etc
and the same thing for another Name ( "yyyyy")
Thanks for help


#5

OK, so for an XML object such as

<Name nameID="xxxx">
  <Type p="1">xxxxxx</Type>
  <Type p="2">xxxxxx</Type>
  <Value obj="1">
    <r p="1">5.94</r>
    <r p="2">62.19</r>
  </Value>
  <Value obj="2">
    <r p="1">5.94</r>
    <r p="2">62.19</r>
  </Value>
</Name>

You can use

filter { xml { source => "message" store_xml => true target => "theXML" force_array => false } }
split { field => "[theXML][Type]" }
split { field => "[theXML][Value]" }
split { field => "[theXML][Value][r]" }

to get a collection of events such as

{
    "theXML" => {
    "nameID" => "xxxx",
      "Type" => {
        "content" => "xxxxxx",
              "p" => "2"
    },
     "Value" => {
          "r" => {
            "content" => "62.19",
                  "p" => "2"
        },
        "obj" => "2"
    }
},
   "message" => "<Name nameID=\"xxxx\">\n  <Type p=\"1\">xxxxxx</Type>\n  <Type p=\"2\">xxxxxx</Type>\n  <Value obj=\"1\"> \n    <r p=\"1\">5.94</r>\n    <r p=\"2\">62.19</r>\n  </Value>\n  <Value obj=\"2\"> \n    <r p=\"1\">5.94</r>\n    <r p=\"2\">62.19</r>\n  </Value>\n</Name>\n",
[...]
}

Then you can use mutate+rename to move the fields around and mutate+remove_field to clean up the debris.


(Mohamed) #6

Thank you Badger, that's work but I get a duplicate result: for example I just need a value corresponding to p='1' for the Type with p='1' and not other value.
With your example I get for example

{
    "theXML" => {
    "nameID" => "xxxx",
      "Type" => {
        "content" => "xxxxxx",
              "p" => "1"
    },
     "Value" => {
          "r" => {
            "content" => "62.19",
                  "p" => "2"
        },
        "obj" => "2"
    }
},

And I don't need this output. Thanks again


(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.