Parse Maven XML .pom files

Hi all, I would like to create a logstash filter to get the project details out of the XML file. For most of the tags, this isn't a problem, because the tags have all the same structure like:

<dependency>
<groupId>apache\</groupId>
<artifactId>artifact\</artifactId>
<version>1.6\</version>
</dependency>

These are fixed blocks, so the structure s every time the same.

But I can't figure out, how to do this dynamically:

<properties>
<version.d>${project.version}</version.d>
<version.module.model>1.1.1</version.module.model>
<version.module.entity>1.1.1</version.module.entity>
<version.module.task>1.1.1</version.module.task>
<version.module.test>1.1.1</version.module.test>
<version.module.web>1.1.1</version.module.web>
</properties>

The field name are always different and there is no fixed structure. I would like to get the name of the tag as my field name dynamically. When I hardcode this, it is posiible to get the field with its corresponding value. But that is not an option, because there are to many of these tags and they have always a different name.

input {
file {
path => "/data/pom/*"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
type => "pom"
codec => multiline {
pattern => "^<project .*>"
negate => true
what => "previous"
max_lines => 1000
}
}
}

filter {

xml {
source => "message"
target => "parsed"
suppress_empty => "true"
#force_content => "true"
}

split {
field => "[parsed][dependencies][0][dependency]"
add_field => {
groupId => "%{[parsed][dependencies][0][dependency][groupId]}"
artifactId => "%{[parsed][dependencies][0][dependency][artifactId]}"
version => "%{[parsed][dependencies][0][dependency][version]}"
}
}
split {
field => "[parsed][profile]"
add_field => {
id => "%{[parsed][profile][id]}"
}
}

split {
field => "[parsed][plugins][0][plugin]"
add_field => {
groupId => "%{[parsed][plugins][0][plugin][groupId]}"
artifactId => "%{[parsed][plugins][0][plugin][artifactId]}"
version => "%{[parsed][plugins][0][plugin][version]}"
}
}

split {
field => "[parsed][properties]"
add_field => {
"version.javax-javaee-api" => "%{[parsed][properties][version.javax-javaee-api]}"
}
}

So the last split, is where I want to get the value of the xml tags itself as a field name.
<version.module.model>1.1.1</version.module.model>
The name of the field should become "version.module.model" and its value "1.1.1".

How can I accomplish that? I looked already at the Ruby filter but I do not have Ruby experience at the moment so I don't know if that is the right direction.

Please ask for further information if necessary :slight_smile:

The ruby filter is the right thing to use here but I don't have time to give a tailored example.

Hello Magnus,

Thank you for your answer. I will try to build the ruby filter myself. Now I know for sure that it is the right direction to go :slight_smile:

If anyone is searching for the same problem:
I managed to solve my problem with this ruby filter:

split {
field => "[parsed][properties]"
}

ruby {
code => "
prop = event.get('[parsed][properties]')
i = 0
while i < prop.length do
event.set(prop.keys[i], event.get('[parsed][properties]' + prop.keys[i] + '[0][content]'))
i += 1
end
"
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.