Logstash XPATH Querying XML element value based on a CSV field Value

I have a CSV file to parse using logstash. i am able to get each of the rows to elastic search from CSV .my current requirement is to get a field value from xml based on CSV input value for each row.. how can i do it kindly help..

my CSV file content looks something like this

id|key |value
10|000-0|12

and xml file which existis in the same folder looks like

   <customer id="10"> 
        <type>frequent buyer</type>
    </customer>

now i am getting the Id value(10 in this case) from csv in a field named ID so how can i get the corresponding type value('frequent buyer') from xml to another field named type in logstash. How can i concurrently parse both csv and xml. since for each row in csv i have to fetch this value from xml. can it be done using XPATH in xml filter

can source attribute point to a real xml path other than message ?

xml {
        store_xml => "true"
        source => 
}

i am a newbie on logstash. Kindly help.. Thanks for your help in advance.

1 Like

This kind of merging of multiple input files is not something Logstash is very good at. I think your best bet would be to read the CSV and XML files elsewhere and possibly produce JSON files or similar that Logstash can process (if you indeed need to use Logstash to process the resulting events).

2 Likes

Hi Jyothish,

It might be worth investigating the logstash-filter-translate plugin:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-translate.html

Though not yet a documented feature, there was a pull request that allows the dictionary functionality where you can replace the value of a field using a lookup table to take not only YAML but also CSV as an input dictionary:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-translate.html#plugins-filters-translate-dictionary_path

I haven't tested out the CSV dictonary format yet but I've used YAML dictionary format and it should work quite well. If that doesn't suit your input data exactly, you might need to do a small amount of data munging to create a suitable dictionary file that the translate plugin will deal with.

See the pull request for the CSV dictionary functionality here:

1 Like

Hi Jyothish,

So I've tested all 3 available dictionary formats and they all work for me.

There appears to be a problem with the documentation generation for this plugin which is why these additional formats were not mentioned, a ticket has been logged against that:

The example formats for the dictionary files are as follows (also note that the dictionary files must have correct suffixes to be recognised):

==> dictionary.csv <==
"Person1","Henry"
"Person2","Thomas"
"Person3","Rufus"

==> dictionary.json <==
{
"Person1": "Henry"
"Person2": "Thomas"
"Person3": "Rufus"
}

==> publishers.yml <==
"Person1": "Henry"
"Person2": "Thomas"
"Person3": "Rufus"
1 Like

Hi Magnus,

Thanks for the inputs.. I am already getting the CSV values.. so can I use a ruby script for reading the xml values in my logstash CSV filter .. will that be possible ?. since logstash need not parse the xml in this case.. Thanks..

Yeah, you should be able to do that with a ruby filter.

Thanks, Pete.. my xml even though i posted a simple one earlier looks something like below after converting to YAML.. its bit hierarchical .. for instance a small portion looks like this .. i need to collect
for eg: if i pass 27 i need class value (Network_equip) associated with it..can this be done using dictionary since i see direct key value pairs in dictionary example in documentation..

products: 
 product: 
  - 
   header:     
    number: 27 
	type: strict    	
    class: Network_equip    
	
   header:     
    number: 32
	type: strict	
    class: violation

Hi Jyothish,

You'll need to flatten this down to use with the translate plugin.

It has to be simple key value pairs.

You might need to write a script to loop through your xml/yaml to pluck only the 'number' and 'class' field values out to use as key and value for your dict.

1 Like

Hi Peter,

Thanks again for your reply. can you please guide me with a sample script glimpse eg where in i can read a xml from logstash. Since if i use xml filter it starts parsing the xml directly.

Thanks and Regards,
Jyothish

I meant that you'd probably have to write an external script to prepare your xml as simplified yaml/csv/json to be able to use it as a dictionary.

1 Like