Help in parsing XML data in logstash in different events


(Aashish Chauhan) #1

Hello,

I need to parse a xml file data in logstash. I had been able to parse data successfully in same event but i want data in different event.

My XML file looks like:

< drugbank>
< drug type="biotech">
< name>Lepirudin< /name>
< description>Lepirudin is identical to natural hirudin except for substitution of leucine< /description >

< drug type="biotech">
< name>Cetuximab< /name>
< description>Epidermal growth factor receptor binding FAB. < /description>
< /drug>
< /drugbank>

and my config file looks like:

input
{
file {
path => "...path/sampledrugbank.xml"
type => "test_drugbank"
start_position => beginning
sincedb_path => "/dev/null"
codec => multiline
{
pattern => "^<?drugbank .*>"
negate => true
what => "previous"
}
}
}
filter {
xml {
source => "message"
force_array => false
xpath => [
"/drugbank/drug/name/text()", "name",
"/drugbank/drug/description/text()", "description"
]
target => "doc"
store_xml => true
}}
output
{
elasticsearch {
codec => json
hosts => "0.0.0.0"
index => "drugbank_index"
}}

Currently i'm getting output in single event as:

{
"name": [ "Lepirudin", "Cetuximab" ]
"description": [ "Lepirudin is identical to natural hirudin except for substitution of leucine for isoleucine , "Epidermal growth factor receptor binding FAB." ]
}

but i want output as:

{
"name": "Lepirudin",
"description": "Lepirudin is identical to natural hirudin except for substitution of leucine for isoleucine
},
{
"name": "Cetuximab",
"description": "Epidermal growth factor receptor binding FAB."
}

Sometimes logstash didn't create index as well even after successful compilation. Help me with this configuration.

Thanks


(Magnus Bäck) #2

You need to use a ruby filter. Ruby's transpose function makes it easy to turn your two input arrays (the name and description fields) into an array that looks like this:

[["Lepirudin", "Lepirudin is identical to natural hirudin except for substitution of leucine for isoleucine"], ["Cetuximab", "Epidermal growth factor receptor binding FAB."]]

See https://stackoverflow.com/questions/15754158/how-to-unzip-an-array for an example.

The result above can then easily be transformed into an array of objects with name and description fields, and if you want them to reside in different event you can use a split filter to splice the values in the array into multiple events.

I don't have time to provide a complete example.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.