Possible to use logstash to merge data from multiple files?


#1

I've got two files, an xml file and a csv file. When I ingest the data from the csv file I want to add_fields from the xml file.

Here's what my xml looks like

<Root>
  <Date>07.31.2015</Run>
  <Customer>
    <Name>John</Name>
    <ID>12345</ID>
  </Customer>
</Root>

And here is my csv

column1,column2,column3
example1,example2,example3
example4,example5,example6

Now I want to pull Date and ID from the xml file and add those fields to each row of the csv when I pull it into elastic search, so something like this:

{
  "_index" : "indx-2.0",
  "_type" : "logfile",
  "_id" : "abcdefg",
  "_score" : 1.0,
  "_source":{"message":["column1,column2,column3"],"Date":"07.31.2015", "CID": "12345"(etc)}
}
{
  "_index" : "indx-2.0",
  "_type" : "logfile",
  "_id" : "abcdefh",
  "_score" : 1.0,
  "_source":{"message":["example1,example2,example3"],"Date":"07.31.2015", "CID": "12345"(etc)}
}
{
  "_index" : "indx-2.0",
  "_type" : "logfile",
  "_id" : "abcdefi",
  "_score" : 1.0,
  "_source":{"message":["example4,example5,example6"],"Date":"07.31.2015", "CID": "12345"(etc)}
}

Is this kind of thing possible using logstash (And what would the .conf file look like)?


Combine multiple sources
(Magnus Bäck) #2

This isn't possible out of the box. You'd have to write a very customized plugin. It's the kind of thing Logstash just isn't very good at.


(Magnus Bäck) #3

Actually, if you have a suitable key for each document (i.e. you don't need to rely on an automatically created id) you might be able to get away with setting the document_id parameter in the Elasticsearch output and do the merging on the Elasticsearch side, so to speak.

The data from the XML and CSV files would be ingested independently and Logstash would update each document twice; first an initial creation with whatever piece of data that comes first (from either the XML or the CSV file) and later a second time when the second piece of data arrives.


(system) #4