Appending data from MongoDB into log files being processed by Logstash and parsed into Elasticsearch

Sorry about the title, my case really couldn't be explained with a single sentence.

Here is my situation:

  1. I have a large set of log files (around 4GB) that I wish to parse with Logstash to use with Elastic stack (Logstash, Elasticsearch, Kibana).
  2. In the logs, there is a serial number that I have successfully parsed with Logstash. This number corresponds to an index of a MongoDB collection. As each log is being parsed, I want to be able to query the collection with the parsed number and retrieve data which I want to include in the final output that is passed to Elasticsearch.

To make things clearer, here's a rough example. Suppose I have the raw log:

2017-11-20 14:24:14.011 123 log_number_one

Before the parsed log gets sent to Elasticsearch, I want to query my MongoDB collection with 123, and get data data1 and data2 to append to the document to be sent to Elasticsearch, so my end result will have fields be similar to something like:

{ 
    timestamp: 2017-11-20 14:24:14.011, 
    serial: 123, 
    data1: "foo", 
    data2: "bar", 
    log: log_number_one
}

An easier way to accomplish this, I assume, would be to simply preprocess the logs and run the numbers through MongoDB before parsing them through Logstash. However, seeing as though I have 4GBs' worth of logfiles, I was hoping for a way to achieve this in one single swoop. I was wondering whether my edge case would be solvable with the ruby filter plugin, where I could possibly run some arbitrary ruby code to do the above?

Any help / advice would be greatly appreciated!

Depending on the number of records and total size of the data in MongoDB (assuming it is a reasonable size data set), you may be able to extract the data into a file where each serial number is associated with a string representation of the data in JSON form. You could then use the translate filter to populate a field with the serialised JSON based on the serial number and then use a son filter to parse this and add it to the event.

1 Like

Thank you so much for your help, I'll give it a try immediately!

EDIT: @Christian_Dahlqvist I haven't fully implemented it yet, but with the way things are going, I'm pretty sure it's going to work as intended. Thanks again! :slight_smile:

EDIT 2: The MongoDB extract ended up being around 50MB, so I had to increase the JVM heap size for Logstash to be able to run normally. The method worked beautifully!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.