Create an array from a nested XML field


#1

Hi,

I'm quite new in ELK and I've encountered a problem almost imposible to break for me.

I have an XML like this:
< root >
< field_A>
< subfield_A >8100291240
< subfield_B >3
< subfield_C >6000436355
< subfield_D >
< subfield_DD >1000
</ subfield_D >
</ field_A>
...
< field_A>
< subfield_A >8100291240
< subfield_B >3
< subfield_C >6000436355
< subfield_D >
< subfield_DD >1000
</ subfield_D >
</ field_A>

The field_A with that format repets for the whole XML thousands of times. My first approach was to use the XML filter to obtain the elements, but I only need two subfields from field_A.

At first, I used this:
add_field => {
field_A => "%{[root][field_A][0][subfield_A ]}"
}

I changed the 0 for a 1, and true enough, I was able to access the second element of the array. It worked like charm but... The problem is that I need to use the same field ALL the time and I don't know beforehand how many "field_A" can I find in the XML. I tried to look for some kind of loop in logstash... No luck.

So, I decided to use the ruby filter, it took me a while but I was able to navigate inside the nested fields but again, same problem, I could only access to an specific element. For that I used this:

code => "event['root'] = event['root']['field_A'][1]['subfield_A']".

So, my question is, how can I use a single key for logstash, having multiple values knowing that this "funcionality" is written inside a way larger configuration file?

In other words, ideally, I'll need something like this:

field_A => {
[subfield_A , subfield_B ],
[subfield_A , subfield_B ],
.
.
.
[subfield_A , subfield_B ]
}

I'm already losing my mind, any help would be appreciated.


(Colin Goodheart-Smithe) #2

I'm going to change the category here to Logstash as you will get access to more people who know about Logstash that way.


(Magnus B├Ąck) #3

So... you want to extract the contents of all subfield_A and subfield_B subelements from all field_A elements?

Turn

<root>
  <fieldA>
    <subfield_A>1</subfield_A>
    <subfield_A>2</subfield_A>
  </fieldA>
  <fieldA>
    <subfield_A>3</subfield_A>
    <subfield_A>4</subfield_A>
  </fieldA>
</root>

into this:

{
  "field_A": ["1", "2", "3", "4"]
  ...
}

#4

Hello, Magnus

Thanks for your answer, but I already know how to extract a single occurrency of field_A, my problem is, how to do that recursively AND store that information in the same field (with the same name).

All I've achieved so far is to overwritte the previous value or save only the first one.

Besides, I don't need only the value of the tags subfield_A and subfield_C, I also need to save the names of the tags to create a field in within, because I need to use those fields in my searchs on Kibana.

TURN THIS:
< root>
< shirt>
< color>red< /color>
< size>5< /size>
< /shirt>
.
.
.
< shirt>
< color>white< /color>
< size>6< /size>
< /shirt>
< /root>

INTO THIS:
{
"shirt": [
[ [ [color],"red"] , [ [size], 5] ],
.
.
.
[ [ [color],"white"] , [ [size], 6] ], ]
}

Do you know how to do this? Thanks for your help.


(system) #5